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To Elaine 


This book is intended to complement my Elements of Algebra, and it 
is similarly motivated by the problem of solving polynomial equations. 
However, it is independent of the algebra book, and probably easier. In 
Elements of Algebra we sought solution by radicals, and this led to the 
concepts of fields and groups and their fusion in the celebrated theory of 
Galois. In the present book we seek integer solutions, and this leads to the 
concepts of rings and ideals which merge in the equally celebrated theory 
of ideals due to Kummer and Dedekind. 

Solving equations in integers is the central problem of number theory, 
so this book is truly a number theory book, with most of the results found 
in standard number theory courses. However, numbers are best understood 
through their algebraic structure, and the necessary algebraic concepts— 
rings and ideals—have no better motivation than number theory. 

The first nontrivial examples of rings appear in the number theory 
of Euler and Gauss. The concept of ideal—today as routine in ring the- 
ory as the concept of normal subgroup is in group theory—also emerged 
from number theory, and in quite heroic fashion. Faced with failure of 
unigue prime factorization in the arithmetic of certain generalized “inte- 
gers’, Kummer created in the 1840s a new kind of number to overcome 
the difficulty. He called them “ideal numbers” because he did not know 
exactly what they were, though he knew how they behaved. Dedekind in 
1871 found that these “ideal numbers” could be realized as sets of actual 
numbers, and he called these sets ideals. 

Dedekind found that ideals could be defined quite simply; so much so 
that a student meeting the concept today might wonder what all the fuss 
is about. It is only in their role as “ideal numbers”, where they realize 
Kummer’s impossible dream, that ideals can be appreciated as a genuinely 
brilhant idea. 


Vil 


Vill Preface 


Thus solution in integers—like solution by radicals—is a superb set- 
ting in which to show algebra at its best. It is the right place to introduce 
rings and ideals and the right place first to apply them. It even gives an 
opportunity to introduce some exotic rings, such as the quaternions, which 
we use to prove Lagrange’s theorem that every natural number is the sum 
of four squares. 

The book is based on two short courses (about 20 lectures each) given 
at Monash University in recent years; one on elementary number theory 
and one on ring theory with applications to algebraic number theory. Thus 
the amount of material is suitable for a one-semester course, with some 
variation possible through omission of the optional starred sections. A 
slower-paced course could stop at the end of Chapter 9, at which point 
most of the standard results have been covered, from Euclid’s theorem that 
there are infinitely many primes to quadratic reciprocity. 

It should be stressed, however, that this is not meant to be a standard 
number theory course. I have tried to avoid the ad hoc proofs that once 
gave number theory a bad name, in favor of unifying ideas that work in 
many situations. These include algebraic structures but also ideas from 
elementary number theory, such as the Euclidean algorithm and unique 
prime factorization. In particular, [ use the Euclidean algorithm as a bridge 
to Conway’s visual theory of quadratic forms, which offers a new approach 
to the Pell equation. 

There are exercises at the end of almost every section, so that each 
new idea or proof receives immediate reinforcement. Some of them focus 
on specific ideas, while others recapitulate the general line of argument (in 
easy steps) to prove a similar result. The purpose of each exercise should be 
clear from the accompanying commentary, so instructors and independent 
readers alike will be able to find an enjoyable path through the book. 

My thanks go to the Monash students who took the courses on which 
the book is based. Their reactions have helped improve the presentation in 
many ways. | am particularly grateful to Ley Wilson, who showed that it 
is possible to master the book by independent study. 

Special thanks go to my wife Elaine, who proofread the first version 
of the book, and to John Miller and Abe Shenitzer, who carefully read the 
revised version and saved me from many mathematical and stylistic errors. 


JOHN STILLWELL 
South Melbourne, July 2002 
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PREVIEW 


Counting is presumably the origin of mathematical thought, and it is 
certainly the origin of difficult mathematical problems. As the great 
Hungarian problem-solver Paul Erd6és liked to point out, if you can 
think of an open problem that is more than 200 years old, then it is 
probably a problem in number theory. 


In recent decades, difficulties in number theory have actually be- 
come a virtue. Public key encryption, whose security depends on 
the difficulty of factoring large numbers, has become one of the 
commonest applications of mathematics in daily life. 

At any rate, problems are the life blood of number theory, and the 
subject advances by building theories to make them understandable. 
In the present chapter we introduce some (not so difficult) problems 
that have played an important role in the development of number 
theory because they lead to basic methods and concepts. 


e Counting leads to induction, the key to all facts about num- 
bers, from banalities such as a+ b = b-+-a to the astonishing 
result of Euclid that there are infinitely many primes. 

e Division (with remainder) is the key computational tool in Eu- 
clid’s proof and elsewhere in the study of primes. 

e Binary notation, which also results from division with remain- 
der, leads in turn to a method of “fast exponentiation” used in 
public key encryption. 


e The Pythagorean equation x* +y* =z” from geometry is equally 
important in number theory because it has integer solutions. 


2 1 Natural numbers and integers 


In this chapter we are content to show these ideas at work in few 
interesting but seemingly random situations. Later chapters will de- 
velop the ideas in more depth, showing how they unify and explain 
a great many astonishing properties of numbers. 


1.1 Natural numbers 


Number theory starts with the natural numbers 
1,2,3,4,5,6,7,8,9,..., 


generated from 1 by successively adding 1. We denote the set of natural 
numbers by N. On N we have the operations + and x, which are simple in 
themselves but lead to more sophisticated concepts. 

For example, we say that a divides nif n = ab for some natural numbers 
a and b. A natural number p is called prime if the only natural numbers 
dividing p are | and p itself. 

Divisibility and primes are behind many of the interesting questions in 
mathematics, and also behind the recent applications of number theory (in 
cryptography, internet security, electronic money transfers etc.). 

The sequence of prime numbers begins with 


2,3,5,7, 11,13, 17, 19,23, 29,31, 37,... 


and continues in a seemingly random manner. There is so little pattern in 
the sequence that one cannot even see clearly whether it continues forever. 
However, Euclid (around 300 BCE) proved that there are infinitely many 
primes, essentially as follows. 


Infinitude of primes. Given any primes P,, P>, P3,---, Py» we can always 
find another prime p. 


Proof. Form the number 


Then none of the given primes p,, P,, P3,---, Pp, divides N because they all 
leave remainder |. On the other hand, some prime p divides N. If N itself 
is prime we can take p = N, otherwise N = ab for some smaller numbers a 
and b. Likewise, if either a or b is prime we take it to be p, otherwise split 
a and b into smaller factors, and so on. Eventually we must reach a prime 
p dividing N because natural numbers cannot decrease forever. O 


1.2 Induction 3 


Exercises 


Not only is the sequence of primes without apparent pattern, there is not even 
a known simple formula that produces only primes. There are, however, some 
interesting “near misses”. 


1.1.1 Check that the quadratic function n* +n-+41 is prime for all small values 
of n (say, for n up to 30). 


1.1.2 Show nevertheless that n* -+n-+41 is not prime for certain values of n. 


1.1.3. Which is the smallest such value? 


1.2 Induction 


The method just used to find the prime divisors of N is sometimes called 
descent, and it is an instance of a general method called induction. 

The “descent” style of induction argument relies on the fact that any 
process producing smaller and smaller natural numbers must eventually 
halt. The process of repeatedly adding | reaches any natural number n ina 
finite number of steps, hence there are only finitely many steps downward 
from n. There is also an “ascent” style of induction that imitates the con- 
struction of the natural numbers themselves—starting at some number and 
repeatedly adding 1. 

An “ascent” induction proof is carried out in two steps: the base step 
(getting started) and the induction step (going from n to n+ 1). Here is an 
example: proving that any number of the form k + 2k is divisible by 3. 


Base step. The claim is true for k = 1 because 1° +2 x 1 = 3, which is 
certainly divisible by 3. 

Induction step. Suppose that the claim is true for k = n, that is, 3 
divides n> + 2n. We want to deduce that it is true for k = n-+ 1, that is, that 
3 divides (n+ 1)?+2(n+1). Well, 


(n+1)?+2(n+1) 
=n +3n?+3n+1+2n+2 
=n +2n+3n? +3n+3 
=n +2n43(n?+n+1) 
And the right-hand side is the sum of n° + 2n, which we are supposing 


to be divisible by 3, and 3(n*+n+1), which is obviously divisible by 3. 
Therefore (n+ 1)? +2(n+1) is divisible by 3, as required. 0 


4 1 Natural numbers and integers 


Induction is fundamental not only for proofs of theorems about N but 
also for defining the basic functions on N. Only one function needs to be 
assumed, namely the successor function s(n) =n-+ 1; then + and x can be 
defined by induction. In this book we are not trying to build everything up 
from bedrock, so we shall assume + and x and their basic properties, but 
itis worth mentioning their inductive definitions, since they are so simple. 

For any natural number m we define m+ 1 by 


m+1=s(m). 
Then, given the definition of m+n for all m, we define m+ s(n) by 
m+s(n) =s(m-+n). 


It then follows, by induction on n, that m-+n is defined for all natural num- 
bers m and n. The definition of m x n is similarly based on the successor 
function and the + function just defined: 


mxil=m 


mx s(n) =mxn+m. 


From these inductive definitions one can give inductive proofs of the basic 
properties of + and x, for example m+n =n-+mand/l(m+n) =Im+In. 
Such proofs were first given by Grassmann (1861) Gn a book intended 
for high school students!) but they went unnoticed. They were rediscov- 
ered, together with an analysis of the successor function itself, by Dedekind 
(1888). For more on this see Stillwell (1998), Chapter 1. 


Exercises 


An interesting process of descent may be seen in the algorithm for the so-called 
Egyptian fractions introduced by Fibonacci (1202). The goal of the algorithm is 
to represent any fraction with 0 < b<aas sum of distinct terms 7 called unit 
fractions. (The ancient Egyptians represented fractions in this way.) 

Fibonacci’s algorithm, in a nutshell, is to repeatedly subtract the largest pos- 


sible unit fraction. Applied to the fraction 4, for example, it yields 


12? 
11 1 _ 5s btractine the lareest unit fraction. £. less than 1! 
72.2. +427 subtracting the largest unit fraction, 5, less than 35, 
5 I 1 


—-—-=-~, subtracting the largest unit fraction, 7, less than * 


2 3? 


1.3 Integers 5 


It turns out that the fractions produced by the successive subtractions always have 
a descending sequence of numerators (11, 5, | in the example), hence they neces- 
sarily terminate with 1. 


1.2.1 Use Fibonacci’s algorithm to find an Egyptian representation of a 


1.2.2 Ifa, b, g are natural numbers with aT < : < r, show that 


b i b! 
~=————_ where 0<b' <b. 
a qt+l  a(gq+l) 


Hence explain why Fibonacci’s algorithm always works. 


1.3 Integers 


For several reasons, it is convenient to extend the set N of natural numbers 
to the group Z of integers by throwing in the identity element 0 and an 
inverse —n for each natural number m. One reason for doing this is to 
ensure that the difference m—n of any two integers is meaningful. Thus 
Z, is a set on which all three operations +, —, and x are defined. (The 
notation Z comes from the German “Zahlen’, meaning “numbers”’.) 

“is an abelian group under the operation +, because it has the three 
group properties: 


Associativity: a+(b+c)=(a+b)+c 
Identity: a+O=a 
Inverse: a+(—a)=0 


and also the abelian property: a+b=b-+a. 

#, 1s much older than the concept of abelian group. The latter concept 
could only be conceived after other examples came to light, particularly 
finite abelian groups. We shall meet some of them in Chapter 3. 

“is a ring under the operations + and x: it is an abelian group under 
+ and the x is linked with + by 


Distributivity: a(b+c) = ab-+ac. 


The ring concept also emerged much later than Z. It grew out of 18th and 
19th century attempts to generalize the concept of integer. We see one of 
these in Section 1.8, and take up the general ring concept in Chapter 10. 
The ring properties show that Z has more structure than N, though it 
must be admitted that this does not make everything simpler. The presence 


6 1 Natural numbers and integers 


of the negative integers —1,—2,—3,...in Z slightly complicates the con- 
cept of prime number. Since any integer 7 is divisible by 1, —1, n and —n, 
we have to define a prime in Z to be an integer p divisible only by +1 (the 
so-called units of Z) and +p. 

In general, however, it is simpler to work with integers than natural 
numbers. Here is a problem that illustrates the difference. 


Problem. Describe the numbers 4m + 7n 
1. where m and n are natural numbers, 


2. where m and n are integers. 


In the first case the numbers are 11, 15, 18, 19, 22, 23, 25, 26, 27 and all 
numbers > 29. The numbers < 29 can be verified (laboriously, I admit) by 
trial. To see why all numbers > 29 are of the form 4m-+ 7m, we first verify 
this for 29, 30, 31, 32; namely 


29 =2x*4+3x7 
30=4x4+2x7 
31=6x4+1x7 
32=1x4+4x7. 


Then we can get the next four natural numbers by adding one more 4 to 
each of these, then the next four by adding two more 4s, and so on (this is 
really an induction proof). 

In the second case, all integers are obtainable. This is simply because 
1 =4x2—/7, and therefore n = 4 x 2n —7 < n, for any integer n. 


This type of problem is easier to understand with the help of the gcd— 
greatest common divisor—which we study in the next chapter. But first we 
need to look more closely at division, particularly division with remainder, 
which is the subject of the next section. 


Exercises 


A concrete problem similar to describing 4m-+ 7n is the McN*ggets problem: 
given that McN*ggets can be bought in quantities of 6, 9 or 20, which numbers 
of McN*ggets can be bought? This is the problem of describing the numbers 
61+ 97+ 20k for natural numbers or zero 1, j and k. 

It turns out the possible numbers include all numbers > 44, and an irregular 
set of numbers < 43. 


1.4 Division with remainder 7 


1.3.1 Explain why the number 43 is not obtainable. 
1.3.2 Show how each of the numbers 44, 45, 46, 47, 48, 49 is obtainable. 
1.3.3. Deduce from Exercise 1.3.2 that any number > 43 is obtainable. 


But if the negative quantities —6, —9 and —20 are allowed (say, by selling 
McN*ggets back), then any integer number of McN*ggets can be obtained. 


1.3.4 Show in fact that | = 9m-+ 20n for some integers m and n. 


1.3.5 Deduce from Exercise 1.3.4 that every integer is expressible in the form 
9m-+ 20n, for some integers m and n. 


1.3.6 Is every integer expressible in the form 6m-+9n? What do the results in 
Exercises 1.3.4 and 1.3.5 have to do with common divisors? 


1.4 Division with remainder 
As mentioned in Section 1.1, a natural number b is said to divide nif n= bc 
for some natural number c. We also say that D is a divisor of n, and that n 
is a multiple of b. The same definitions apply wherever there is a concept 
of multiplication, such as in Z. 

In N or Zit may very well happen that b does not divide a, for example, 
4 does not divide 23. In this case we are interested in the guotient g and 
remainder r when we do division of a by b. The quotient comes from the 
ereatest multiple gb of b that is <a, and the remainder is a— gb. For 
example 

23=5«4+4+3, 


so when we divide 23 by 4 we get quotient 5 and remainder 3. 

The remainder r = a — gb may be found by repeatedly subtracting b 
from a. This gives natural numbers a,a—b,a—2b,..., which decrease and 
therefore include a least member r = a — gb > 0 by descent. Then r < b, 
otherwise we could subtract b again. The remainder r < Dis also evident in 
Figure 1.1, which shows a lying between successive multiples of b, hence 
necessarily at distance < b from the nearest such multiple, gb. 


0 b 2b gb a (q+l)b 
= _ } sae e-—O @ 
r 


Figure 1.1: Division with remainder 
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Important. The main purpose of division with remainder is to find the 
remainder, which tells us whether b divides a or not. 

It does not help (and it may be confusing) to form the fraction a/b, be- 
cause this brings us no closer to knowing whether b divides a. For example, 
the fraction 

43560029 
771777 


does not tell us whether 77777 divides 43560039 or not. To find out, we 
need to know whether the remainder is 0 or not. We could do the full 
division with remainder: 


43560029 = 560 x 77777 + 4909 


which tells us the exact remainder, 4909, or else evaluate the fraction nu- 
merically 


43560029 


= 560.0631..., 
T1ITTT : 


which is enough to tell us that the remainder is 4 0. (And we can read off 
the quotient g = 560 as the part before the decimal point, and hence find 
the remainder, as 43560029 — 560 x 77777 = 4909.) 


Exercises 


1.4.1 Using a calculator or computer, use the method above to find the remainder 
when 12345678 is divided by 3333. 


1.4.2 Calculate the multiples of 3333 on either side of 12345678. 


1.5 


Binary notation 


Division with remainder is the natural way to find the binary numeral of 
any natural number n. The digits of the numeral are found by dividing n 
by 2, writing the remainder, and repeating the process with the quotient 
until the quotient O is obtained. Then the sequence of remainders, written 
in reverse order, is the binary numeral for n. 
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Example. Binary numeral for 2001. 


2001 = 1000 x 2+ 1 
1000 = 500 x 2+0 
500 = 250 x 2+0 
250 = 125 x 2+0 


125=62x24+1 
62=31x2+0 
31=15x2+1 
IS=7x2+4+1 
7=3x2+1 
3=1x2+1 
1=0x2+1. 


Hence the binary numeral for 2001 is 11111010001. 
A general binary numeral a,a,_, ...d,d), where each a, is 0 or 1, stands 
for the number 


_ k k—1 
n=a,2°+a,_,2° 0 +++ +a,2+dp, 


because repeated division of this number by 2 yields the successive remain- 
ders dy,@,,.-.,@,_,,a@,. Thus one can reconstruct n from its binary digits 
by multiplying them by the appropriate powers of 2 and adding. 

However, it is more efficient to view a,d,_, ...d,dp as a code for con- 
structing n from the number O by a sequence of doublings (multiplications 
by 2) and additions of 1, namely the reverse of the sequence of operations 
by which the binary numeral was computed from n. Moving from left to 
right, one doubles and adds a, (if nonzero) for each digit a;. 

Figure 1.2 shows a way to set out the computation, recovering 2001 
from its binary numeral 11111010001. 


The number of operations 


The number of doublings in this process is one less than the number of 
digits in the binary numeral for n, hence less than log, n, since the largest 
number with k digits is 2 — 1 (whose binary numeral consists of k ones), 
and its log to base 2 is therefore < k = log, (2°). 
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Figure 1.2: Recovering a number from its binary numeral 


Likewise, there are < log, n additions. So the total number of opera- 
tions, either doubling or adding 1, needed to produce n is < 2log,n. 


This observation gives a highly efficient way to compute powers, based 
on repeated squaring. To form m”, we begin with m = m!, and repeatedly 
double the exponent (by squaring) or add 1 to it (by multiplying by m). 
Since we can reach exponent n by doubling or adding | less than 2log,n 
times, we can form m” by squaring or multiplying by m less than 2log,n 
times. That is, it takes less than 2log,n multiplications to form m". 

Thus the number of operations is roughly proportional to the length of 
n (the number of its binary or decimal digits). Few problems in number 
theory can be solved in so few steps, and the fast solution of this particular 
problem is crucial in modern cryptography and electronic security systems 
(see Chapter 4). 


1.6 Diophantine equations 1] 


Exercises 


Binary notation is more often used by computers than humans, since we have 
10 fingers and hence find it convenient to use base 10 rather than base 2. How- 
ever, some famous numbers are most simply written in binary. Examples are the 
Mersenne primes, which are prime numbers of the form 2? — 1 where p is prime. 


1.5.1 Show that the binary numeral for 2? — 1 is 111---1 (p digits), and that 
the first four Mersenne primes have binary numerals 11, 111, LL111, and 
1114111. 


1.5.2 However, not every prime p gives a prime 2? — 1: factorize 2!! — 1. 


1.5.3. Show also that 2” — | is never a prime when n is not prime. (Hint: suppose 
that n = pq, let x = 2?, and show that x — 1 divides x? — 1.) 


Mersenne primes are named after Marin Mersenne (1588-1648) who first 
drew attention to the problem of finding them. They occur (though not under 
that name) in a famous theorem of Euclid on perfect numbers. A number is called 
perfect if it equals the sum of its proper divisors (divisors less than itself). For 
example, 6 is perfect, because its proper divisors are 1, 2 and 3, and6=14+2+3. 
Euclid’s theorem is: if 2? — 1 is prime then 2?~'(2P — 1) is perfect. 

We discuss this theorem further in Chapter 2 when we have developed some 
theory of divisibility. In the meantime we observe that Euclid’s perfect numbers 
also have binary numerals of a simple form. 


1.5.4 Show that the first four perfect numbers arising from Mersenne primes have 
binary numerals 110, 11100, 111110000, and 1111111000000. 


1.5.5 What is the binary numeral for 2?~!(2? — 1)? 


1.6 Diophantine equations 

Solving equations is the traditional goal of algebra, and particular parts 
of algebra have been developed to analyze particular methods of solution. 
Solution by radicals is one branch of the tradition, typified by the ancient 


formula 
b+ Vb? —4ac 
= 2a 

for the solution of the general quadratic equation ax* + bx +c = 0, and by 
more complicated formulas (involving cube roots as well as square roots) 
for the solution of cubic and quartic equations. This method of solution 1s 
analyzed by means of the field and group concepts, which lead to Galois 
theory. Its main results may be found in the companion book to this one, 
Stillwell (1994). 
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The other important branch of the tradition is finding integer solutions, 
the main theme of the present book. It leads to the ring concept and ideal 
theory. Equations whose integer solutions are sought are called Diophan- 
tine, even though it is not really the equations that are “Diophantine’’, but 
the solutions. Nevertheless, certain equations stand out as “Diophantine” 
because their integer solutions are of exceptional interest. 


e The Pythagorean equation x* + y* = z*, whose natural number solu- 
tions (x, y,z) are known as Pythagorean triples. 


e The Pell equation x* — ny” = 1 for any nonsquare natural number n. 
e The Bachet equation y° = x* +n for any natural number n. 
e The Fermat equation x’ + y" = z” for any integer n > 2. 


The Pythagorean equation is the oldest known mathematical problem, 
being the subject of a Babylonian clay tablet from around 1800 BCE known 
as Plimpton 322 (from its museum catalogue number). The tablet contains 
the two columns of natural numbers, y and z shown in Figure 1.3. 


Figure 1.3: Plimpton 322 
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The left part of the table is missing, but it is surely a column of values 
of x, because each value of z” — y* is an integer square x*, and so the table 
is essentially a list of Pythagorean triples. 

This means that Pythagorean triples were known long before Pythago- 
ras (who lived around 500 BCE), and the Babylonians apparently had so- 
phisticated means of producing them. Notice that Plimpton 322 does not 
contain any well known Pythagorean triples, such as (3,4,5), (5,12, 13) or 
(8,15,17). It does, however, contain triples derived from these, mostly in 
nontrivial ways. 

Around 300 BCE, Euclid showed that all natural number solutions of 
x* + y? = z* can be produced by the formulas 


x= (uv 


—v")\w, y=2uvw, z= (u?+v")w 
by letting u, v and w run through all the natural numbers. (Also the same 
formulas with x and y interchanged.) 

It is easily checked that these formulas give 


Pyya2, 
but it is not so easily seen that every solution is of Euclid’s form. Another 
approach, using rational numbers, was found by Diophantus around 200 
CE. Diophantus specialized in solving equations in rationals, so his solu- 
tions are not properly “Diophantine” in our sense, but in this case rational 
and integer solutions are essentially equivalent. 


Exercises 


1.6.1 Check (preferably with the help of computer) that z* — y* is a perfect square 
for each pair (y,z) in Plimpton 322. 


1.6.2 Check also that x is a “round” number in the Babylonian sense, that is gen- 
erally divisible by 60, or at least by a divisor of 60. (The Babylonian system 
of numerals had base 60.) 


1.6.3 Verify that if 
x=(u’—v’)w, y=2uvw, z=(u?+v")w 
then x* + y* = 2’. 


1.6.4 Find values of u and v (with w = 1) that yield the Pythagorean triples (3,4,5), 
(5,12,13), (7,24,25) and (8,15,17) when substituted in Euclid’s formulas. 


14 1 Natural numbers and integers 


1.7 The ethod 


Diophantus chord m 


An integer solution (x,y,z) = (a,b,c) of x° + y* = z* implies 


so X = a/c, Y = b/c is a rational solution of the equation 
X*4+Y* =1, 


in other words, a rational point on the unit circle. (Admittedly, any multi- 
ple of the triple, (ma,mb,mc), corresponds to the same point, but we can 
easily insert multiples once we have found a, b and c from X and Y.) 

Diophantus found rational points on X* + ¥* = 1 by an algebraic method, 
which has the geometric interpretation shown in Figure 1.4. 


Y 


Figure 1.4: The chord method for rational points 


If we draw the chord connecting an arbitrary rational point R to the 
point Q = (—1,0) we get a line with rational slope, because the coordinates 
of R and Q are rational. If the slope is t, the equation of this line is 


Y=t(X +1). 


Conversely, any line of this form, with rational slope t, meets the circle at a 
rational point &. This can be seen by computing the coordinates of RK. We 
do this by substituting Y = t(X +1) in X*+Y* = 1, obtaining 


X*4(X +1)? =1, 


1.7. The Diophantus chord method 15 


which is the following quadratic equation for X: 
X*(1407)4+27P?X +4 —-1=0. 


The quadratic formula gives the solutions 


—  1-P 

rn oc 
The solution X = —1 corresponds to the point Q, so the X coordinate at R 
iS ive. and hence the Y coordinate 1s 


To sum up: an arbitrary rational point on the unit circle X* + Y* = 1 has 
coordinates 


l-r _2t for arbitrary rational t 
+P 14h) oe : 


Now we can recover Euclid’s formulas. 
An arbitrary rational t can be written tf = v/u where u,v € Z, and the 
rational point R then becomes 


2 ; , 
I-- 2: [et oad 


29 2 22? 1,2 4.42 
1+4 1+ uU“ + Vr Uu*~-+V 


ue 


Thus if this is 
XY 

(= *) for some x,y,z € Z 
Z 


we must have 


x w-Vv y 2UV 


2 wtr? 2 w+ 
for some u,v € Z. 
Euclid’s formulas for x, y and z also give these formulas for x/z and 
y/z, so the results of Euclid and Diophantus are essentially the same. 
There is little difference between rational and integer solutions of the 
equation x* + y* = z* because it is homogeneous in x, y and z, hence any 
rational solution can be multiplied through to give an integer solution. The 
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situation is quite different with inhomogeneous equations, such as y* = 
x° — 2, where the integer solutions may be much harder to find. 

Diophantus’ method for rational solutions can be generalized to cubic 
equations, where it has enjoyed great success. See for example, Silverman 
and Tate (1992). However, it does not yield integer solutions except in the 
rare cases where the equation is homogeneous, and hence it diverges from 
the path we follow in this book. Indeed, it is often the case—for example 
with Bachet equations—that a cubic equation has infinitely many rational 
solutions and only finitely many integer solutions. Since we wish to study 
integer solutions, we now take our leave of the chord construction, and turn 
in the next section to an algebraic approach to Pythagorean triples: the use 
of “generalized integers”. 


Exercises 


Diophantus himself extended his method to equations of the form 
2 ges 
y~ = cubic in x, 


where all coefficients are rational. Here the link between the geometry and the 
algebra is that a straight line through two rational points meets the curve in a 
third rational point. When there is only one “obvious” rational point on the curve, 
then one can use the tangent through this point instead of a chord, because the 
tangent meets the curve twice when viewed algebraically. 

The equation y* = x° —2 is a good one to illustrate the tangent method, as well 
as the formidable calculations it can lead to. (Note that this is a Bachet equation; 
here we have interchanged x and y to conform with the usual notation for cubic 
curves.) 


1.7.1 Show that the tangent to y* = x° —2 at the “obvious” rational point (3,5) is 


,— 2ix 3 
~~ TO ~ 10 
1.7.2 By substituting y = a4 — a in the equation of the curve, show that the 


tangent meets the curve where 100x° — 729x* + 1674x — 1161 = 0. 


1.7.3 By dividing 100x° — 729x* + 1674x — 1161 twice by x —3, or otherwise, 


show that the tangent meets the curve twice at x = 3 and once at x = a. 


1.7.4 Hence find a rational point on y* = x° —2 other than (3, +5). 


There are in fact infinitely many rational points on the curve y* = x° —2 
(though this was not known until 1930; see Mordell (1969), Chapter 26), but we 
show later that its only integer points are (3, +5). 
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1.8 Gaussian integers 


The Pythagorean equation appears in a new light if we use complex num- 
bers to factorize the sum of two squares: 


x+y? = (x—yi)(x+yi) wherei= /—1. 


Given that x and y are integers, the factors x — yi, x + yi may be regarded as 
“complex integers’. We denote the set of such “integers” by 


Zli| = {a+bi:a,b€ Z} 


and call them the Gaussian integers, after Gauss, who was the first to real- 
ize that Z|i] has many properties in common with Z. 

For a start, it is clear that the sum, difference and product of numbers 
in Z|i| are also in Z|i], hence we can freely use +, —, and x and calculate 
by the same rules as in Z. This already gives nice results about sums of 
squares and Pythagorean triples. 


Two square identity. A sum of two squares times a sum of two squares is 
a sum of two squares, namely 


(aj + bj) (ap +55) = (aa, —b,b,)* + (a,b, +b,a,)°. 


Proof. We factorize the sums of two squares as above, then recombine the 
two factors with negative signs, and the two factors with positive signs: 


(aj +. bf) (a3 +3) = (ay — by) (ay + By) (Gy — byi) (Gy + yi) 
= (a, — bf) (ay — bgt) (ay + Dy 1) (ay + by!) 
= [dy dy — by by — (ay by + by ay)i] x 
|Qydy — Dy by + (ay by + By ay)i 
= (4,4) — byby)° + (aby + yay)’. UW 
Corollary. [f the triples (a,,b,,c,) and (ay,b,,c,) are Pythagorean, then 
so is the triple (a,a, — b,b,,a,b, + b,a,,c,¢,). 
Proof. If (a,,b,,c,) and (a,,b,,c,) are Pythagorean triples, then 


% 
a+bi=c and ay+b5=c5. 
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It follows that 
(c,¢y)° = cfe5 = (aj +7) (G +03) 


= (a,a, —b, by)” + (ayb, +b,a,)” _ by the identity above, 


and this says that (a,a, — b,b,,a,b, + b,a,,c,c,) is a Pythagorean triple. 
= 


Of course, the two square identity can be proved without using /—1, 
by multiplying out both sides and comparing the results. And presumably 
it was first discovered this way, because it was known long before the in- 
troduction of complex numbers. Though first given explicitly by al-Khazin 
around 950 CE, it seems to have been known to Diophantus, and perhaps 
even to the Babylonians, because many of the triples implicit in Phmpton 
322 can be obtained from smaller triples by the Corollary (see exercises). 

However, the two square identity is more natural in the world C of 
complex numbers because it expresses one of their fundamental properties: 
namely, the multiplicative property of their norm. If z= a+ bi we define 


norm(z) = a+ bil? = a* +b”, 
and it follows from the two square identity that 
norm(z, )norm(z,) = norm(z,z, ) (*) 
because Zz, =a, + b,i and z, =a, + b,i imply 
242g = Ay Ay — Dy by + (ay by + Dy ay)t. 


[In algebra and complex analysis it is more common to state the multiplica- 
tive property (*) in terms of the absolute value |\z| = Va? +b’, namely 


IZ 11Zo| = ZZ]. Ce) 


(*) and (**) are obviously equivalent, but the norm is the more useful con- 
cept in Z|i] because it is an ordinary integer, and this allows certain prop- 
erties of Zi) to be derived from properties of Z. 

So much for the elementary properties of Gaussian integers. Z|i] also 
has deeper properties in common with Z, involving divisors and primes. 
These properties will be proved for Z in the next chapter, and for Z[i| in 
Chapter 6. 
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However, we can travel a little further in the right direction by follow- 
ing the dream that Z/|i] holds the secrets of the Pythagorean equation 


=x ty = (x—yi)(x+yi). 

[f the integers x and y have no common prime divisor, then it seems likely 
that x — yi and x + yi also have no common prime divisor, whatever “prime” 
means in Zi]. If so, then it would seem that the factors x — yi, x + yi of the 


square z* are themselves squares in Z|i]. In particular, 


x—yi=(u—vi)* forsome u,v € Z. 


But in that case 
. 9 . 
x—yis (u* — v*) —2Quvi 


and, equating real and imaginary parts, 


x=u'—v’, y=2uv, andhence z= u> +, 
Thus we have arrived again at Euclid’s formula for Pythagorean triples! 
(Or more precisely, the formula for primitive Pythagorean triples, from 
which all others are obtained as constant multiples. The primitive triples 
are those for which x, y, and z have no common prime divisor, and they 
result from u and v with no common prime divisor.) 

The idea that factors of a square with no common prime divisor are 
themselves squares is essentially correct in Zi], but to see why we must 
first understand why it is correct in N. This will be explained in the next 
chapter. 


Exercises 


The rule in the Corollary for generating new Pythagorean triples from old gives 
some interesting results. 


1.8.1 Find the Pythagorean triples generated from 
e (4,3,5) and itself, 
@ (12,5,13) and itself, 
e (15,8,17) and itself. 
1.8.2 Do these results account for any of the entries in Plimpton 322? 


1.8.3 Try to generate other entries in Plimpton 322 from smaller triples. 
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It is clear that we can generate infinitely many Pythagorean triples (x,y,z) but not 
clear (even from Euclid’s formulas) whether there are any significant constraints 
on their members x, y, and z. For example, can we have x and y odd and z even? 
This question can be answered by considering remainders on division by 4. 


1.8.4 Show that the square of an odd integer 2n + 1 leaves remainder | on division 
by 4. 


1.8.5 What is the remainder when an even square is divided by 4? 


1.8.6 Deduce from Exercises 1.8.4 and 1.8.5 that the sum of odd squares is never 
a square. 


1.9 


Discussion 


The discovery of Pythagorean triples, in which the sum x7 + y* of two 
squares is itself a square, leads to a more general question: what values 
are taken by x7 + y* as x and y run through Z? The exercises above imply 
that x* + y* can not take a value of the form 4n +3 (why?), and the main 
problem in describing its possible values is to find the primes of the form 
x? y’, 

Such questions were first studied by Fermat around 1640, sparked by 
his reading of Diophantus. He was able to answer them, and also the cor- 
responding questions for x* + 2y* and x° + 3y’. In the 18th century this 
led to study of the general guadratic form ax* + bxy + cy* by Euler, La- 
grange, Legendre and Gauss. The endpoint of these investigations was the 
Disquisitiones Arithmeticae of Gauss (1801), a book of such depth and 
complexity that the best number theorists of the 19th century—Dirichlet, 
Kummer, Kronecker, and Dedekind—found that they had to rewrite it so 
that ordinary mortals could understand Gauss’s results. 

The reason that the Disquisitiones is so complex is that abstract alge- 
bra did not exist when Gauss wrote it. Without new algebraic concepts the 
deep structural properties of quadratic forms discovered by Gauss cannot 
be clearly expressed; they can barely be glimpsed by readers lacking the 
technical power of Gauss. It was precisely to comprehend Gauss’s ideas 
and convey them to others that Kummer, Kronecker, and Dedekind intro- 
duced the concepts of rings, ideals, and abelian groups. 

An intermediate step in the evolution of ring theory was the creation 
of algebraic number theory: a theory in which algebraic numbers such 
as \/2 and i are used to illuminate the properties of natural numbers and 
integers. Around 1770, Euler and Lagrange had already used algebraic 
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numbers to study certain Diophantine equations. For example, Euler suc- 
cessfully found all the integer solutions of y°? = x7 + 2 by factorizing the 
right-hand side into (x + /—2)(x — /—2). He assumed that numbers of 
the form a+ b\/—2 “behave like” integers when a and b themselves are 
integers (see Section 7.1). The same assumption enables one to determine 
all primes of the form x* + 2y”. 

Such reasoning was rejected by Gauss in the Disquisitiones, since it 
was not sufficiently clear what it meant for algebraic numbers to “behave 
like” integers. In 1801 Gauss may already have known systems of algebraic 
numbers that did not behave like the integers. He therefore worked directly 
with quadratic forms and their integer coefficients, subduing them with 
his awesome skill in traditional algebra. However, Gauss (1832) took the 
first step towards an abstract theory of algebraic integers by proving that 
the Gaussian integers Z|i] do indeed “behave like” the ordinary integers 
Z,, specifically with respect to prime factorization. Among other things, 
this gives an elegant way to treat the quadratic form x* + y’, as we see in 
Chapter 6. 

The great achievement of Kummer and Dedekind was to tame the sys- 
tems of algebraic numbers that do not behave like Z, by adjoining new 
“numbers” to them. Kummer’s mystery ideal numbers, and Dedekind’s 
demystification of them in 1871, are among the most dramatic discover- 
ies of mathematics. Ideal numbers also emerge naturally from the theory 
of quadratic forms, in particular from the form x” + Sy”, so we follow the 
thread of quadratic forms throughout this book. Quadratic forms not only 
give the correct historical context for most of the concepts normally cov- 
ered in ring theory but also provide the simplest and clearest examples. 


PREVIEW 


Prime numbers may be regarded as the “building blocks” of the nat- 
ural numbers because any natural number is a product of primes. 
(This explains, by the way, why | is not regarded as a prime— 
nothing is built from products of 1 except I itself). But even if 
primes are the building blocks, it is not easy to grasp them directly. 
There is no simple way to test whether a given natural number is 
prime, nor to find the smallest prime divisor of a given number. 


Instead of studying the divisors of single numbers it is better to study 
the common divisors of pairs a, b. The ancient Euclidean algorithm 
is a remarkably efficient way to find the greatest common divisor 
(gcd) of given natural numbers a and b and it throws unexpected 
light on prime numbers and prime factorization. 

It does so by representing gcd(a, b) as a linear combination ma+nb, 
where m and n are integers. This also leads to a clear understanding 
of the problem of solving linear equations in integers. 


2.1 The ged by subtraction 
If natural numbers a and b have a common divisor d, then 
a=ad and b=b'd 


for some natural numbers a’ and b’. From this it follows that d divides a—b 
because 
a—b=dd—b'd=(d —b')d. 
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In other words, a common divisor of a and b ts also a divisor of a — b. 
Euclid used this fact to find the greatest common divisor, gcd(a,b), by 
“repeatedly subtracting the smaller number from the larger”. More pre- 
cisely, his algorithm goes as follows. 
Suppose that a > b and let 


a,=a, b,=b. 


Then for each pair (a;,b,) we form the pair ( b..,), where 


Gi iv 


a,,, =max(b,,a,—b;),  b,,, = min(b,,a;— ;). 


L 


Since this process produces smaller and smaller natural numbers, it must 
halt (by “descent”). We eventually get 


ay, = by, 


in which case we conclude that gcd(a,b) = a, = b,. 
The reason this algorithm works is that 


gcd(a,,b,) = ged(ay,b,) = --- = ged(a,,b,), 


since any common divisor of the pair (a,,b,) is also a divisor of the pairs 
(a,,b,), (a;,b3), ..., (a,,,) produced by the successive subtractions. 
Example. a = 34, b= 19 


The algorithm gives the following pairs: 


(a,,b,) = (34,19) 

(dy, by) = (19,34 — 19) = (19, 15) 
(a;,b,) = (15,19 — 15) = (15,4) 
(a,,b,) = (15—4,4) = (11,4) 
(a5,b5) = (11—4,4) = (7,4) 
(ag,¢) = (4,7 -4) = (4,3) 
(a,,b,) = (3,4-3) =(3 

(dg, Dg) = ( 

(dg, Dy) = ( 


and therefore gcd(34, 19) = gcd(1,1) = 1. 
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Integer pairs a, b such that gcd(a, b) = 1 are said to be relatively prime. 
Thus the Euclidean algorithm gives a simple means of deciding whether 
integers are relatively prime. In the next section we see that the algorithm 
(in a slightly modified form) is also highly efficient: it gives ged(a,b) in 
a number of steps comparable with the total number of digits in a and b. 
It is harder to recognize whether a single integer n is prime: the obvious 
methods require a number of steps comparable with the size of n, which is 
exponentially larger—around 2* if k is the number of binary digits of n. 


Exercises 


Starting with a pair of natural numbers and running the subtractive algorithm 
backwards—that is, repeatedly adding the two numbers most recently produced— 
gives what is called a Lucas sequence. The most famous of them is the Fibonacci 
sequence 1, 1, 2,3, 5, 8, 13, ..., obtained by starting with the pair (1, 1). 


2.1.1 Explain why the gcd of any two successive Fibonacci numbers is 1. 


2.1.2 Consider the Lucas sequence that begins with 1, 3, 4, 7, 11, 18, 29, .... 
What is the gcd of any two successive terms? 


The exponential difficulty of testing whether an integer 7 is prime can be seen 
in the case of the well-known method of trial division by integers < \/n. 


2.1.3 If n has a divisor 4 1,n, explain why there must be such a divisor < \/n. 


2.1.4 If n has k digits in its binary numeral, show that there are at most 2‘/? 
numbers < \/n. Can there be exactly 2/79 


The fundamental fact about common divisors, that if d divides a and b then d 
divides a+ b, throws light on primitive Pythagorean triples. 


2.1.5 If x = 2uv and y = u* — v’, show that (x,y,z) is a primitive Pythagorean 


2.2 The gcd by division with remainder 

Euclid’s form of the gcd algorithm is usually speeded up by doing division 
with remainder instead of repeated subtraction. Given a pair (a,,b,) with 
a, > b,, the next pair is produced by the rule 


Gj,,=5;,  6,,,; = remainder when a; is divided by 5, 


This is more efficient when a, is many times as large as b;, in which case 
many subtractions are replaced by one division. 
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However, the algorithm is essentially the same—division of natural 
numbers is just repeated subtraction—so it is still true that 


gcd(a,,b,) = gcd(a,,b,) =--- 


The only difference is that halting now occurs when b, divides a,, in which 
case we conclude that gcd(a,b) = gced(a,,b,) = by. 


Example. a = 34, b = 19 again. 
The algorithm with division gives the following pairs: 


(a,,b,) = ( 
(a,,b) = (19,34 — 19) = (19, 
(a;,b;) = (15,19 —15) = (15, 
(a4,b4) = ( 
(a5,b5) = ( 


4,15—-3x4) = (4,3 
3,4—3) = (3,1) 


Hence gcd(34, 19) = 1 because 1 divides 3. 


In this form of the algorithm it is easy to see that the number of divi- 
sions is comparable with the total number of digits in a and b. In fact, if 
a and b are written in binary, then each division reduces the total number 
of digits by at least one. If a has more digits than b this is clear: the new 
pair is b together with a remainder on division by b that has no more digits 
than b. If a and b have the same number of digits then, since both a and b 
necessarily begin with the digit 1, the remainder is simply a — b, and it has 
fewer digits than b. 

The division form of the Euclidean algorithm is not only more efficient; 
it also has wider applicability. For example, in Z|i] we can divide 17 by 
4+ i (exactly) and get the quotient 4 — 7, but it is meaningless to subtract 
4+ i from 17 “4 —i times”. Thus division in Z[i] is not generally the result 
of repeated subtraction. Any Euclidean algorithm in Z|i] (and we see one 
in Section 6.4) necessarily uses division with remainder. 


Exercises 


The division form of the Euclidean algorithm on (a,b), where a > b, occurs when 
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and the process may then be repeated for the fraction b/r since b > r by construc- 
tion. However, the pair (b,r) is smaller than the initial pair (a,b), so this process 
terminates. The result is called the continued fraction for a/b. 


2.2.1 Starting with 


4 | _, 1 
19 19 9/15 te a 


show that the continued fraction for 34/19 is 


1 
1+ ; 
+p Li 
1+ % 
2.2.2 Show similarly that 
43 14. i 
300 


2.2.3 Show in general that 


a n I 
-— = @ 1 rrr 
b dat x 


G37 


where 4,,45,43,-.. are the successive quotients occurring when the divi- 
sion form of the Euclidean algorithm is applied to (a,b). 


In the 18th century, Euler saw that the Euclidean algorithm could be implemented 
by continued fractions, and this became the favored way to describe it for a cen- 
tury or more. For example, Gauss ignores Euclid and refers exclusively to the 
“continued fraction” algorithm in his Disquisitiones. The Euclidean algorithm as 
we know it made a comeback with Dirichlet’s Vorlesungen uber Zahlentheorie 
(lectures on number theory) of 1863. 


2.3 Linear representation of the gcd 


Probably the most important consequence of the Euclidean algorithm is 
that 
ecd(a,b) = ma-+nb_ for some integers m and n. 


In fact it is true that all the numbers a, and b, produced by the Euclidean 
algorithm are of the form ma-+-nb for integers m and n, and the b, of course 
include gcd(a,b) = b,. 
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We prove this statement about a, and b, by the “ascent” form of induc- 
tion. For a start, we certainly have 


a,=1xa+0xb, 6, =O0xa+1xb, 


so the statement is true for: = 1. And if a, and 5, are both of the form 
ma-+nb the same is true for their difference, hence for a;,, and b,,,. Thus 
all numbers produced from the pair (a,b) by the Euclidean algorithm are 
of the form ma-+ nb, as required. C 


This proof also suggests a way to find the m and n for a given a and b: 
run the Euclidean algorithm to find gcd(a,b) and keep track of the coeffi- 
cients m and n for each number a, and 0, that the algorithm produces. 

A practical way to do this is shown in the following example, in which 
the numerical calculation of gcd on 34, 19 is run in parallel with a symbolic 
calculation on letters a, b. Each time we subtract some multiple of the 
second number from the first we do exactly the same operation on letters. 
Hence the final combination of letters equals the gcd. 


Example. gcd(34,19) = gcd(a,b) in the form ma+nb. For efficiency, 
we use division with remainder, subtracting the appropriate multiple of the 
second number from the first to get the remainder at each step. 


(34,19) = (a,b) 

(19,15) = (b,a—b) 

(15,4) = (a—b,b—(a—b)) = (a—b,-—a+2b) 

(4,3) = (-a+ 2b,a—b—3(—a+2b)) = (—a+2b,4a—7b) 
(3,1) = (4a—7b, —a+2b— (4a—7b)) = (4a—7b, —5a+ 9b). 


ff 4 og 


From the last line we read off the gcd, 
1 = —Sa+ 9b. 
This checks, because 
—5 x 34+9x 19=—170+171=1. 


The Euclidean algorithm is extremely important in practice and theory. 
[It is useful in practice because it is unusually fast—it gives the gcd of k- 
digit numbers in around & steps—much faster than any known algorithm 
for finding divisors of one k-digit number. 
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And the gcd is not only simpler in practice but also in theory. The basic 
theory of divisors and primes is based on the theory of the gcd, as we see 
in Section 2.4. 

We often call on the Euclidean algorithm to find gcd(a,b) and to find 
integers m and n such that gcd(a,b) = ma+nb. So make sure you get 
plenty of practice in using it right now! 


Exercises 


2.3.1 Find gcd(63, 13) by the Euclidean algorithm, and hence find m and n such 
that 63m-+ 13n = 1. 


2.3.2 Find mand n such that 55m+ 34n =1 


2.4 Primes and factorization 


In Section 1.1 we used a descent argument to show that certain natural 
numbers have prime factors. A slight generalization of the argument shows: 


Existence of prime factorization. Each natural number n can be written 
as a product of primes, 


1 = P) P2P3°°* Px: 


Proof. If n itself is a prime there is nothing to do. If not, n = ab for some 
smaller natural numbers a and b. If a or b is not prime we split it into 
smaller factors, and so on. Since natural numbers cannot decrease forever, 
we eventually get a factorization 


in which no p, is a product of smaller numbers. That is, each p, is prime.O 


So much for the existence of prime factorization, which is important 
enough because it implies the existence of infinitely many primes, as in 
Section 1.1. Even more important is the uniqueness of prime factoriza- 
tion—no matter how we split n into smaller factors we always arrive at the 
same primes in the end. 
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Prime divisor property. /f a prime p divides the product of natural num- 
bers a and b, then p divides a or p divides b. 


Proof. Suppose p does not divide a, so we need to show that p divides b. 
Now if p does not divide a we have gcd(a, p) = 1, since the only divi- 
sors of p are 1 and p. Therefore, by the result in Section 2.3, 


1=ma-+np _ for some integers m and n. 
Multiplying both sides of this equation by b we get 
b = mab-+npb. 


Now look at the right-hand side: p divides ab by assumption, and p ob- 
viously divides pb. Thus p divides both terms on the right, and hence it 
divides their sum. That is, p divides b, as required. O 


Unique prime factorization. The prime factorization of each natural 
number is unique (up to the order of factors). 


Proof. Suppose on the contrary that some natural number has two differ- 
ent prime factorizations. Cancelling any primes common to both factoriza- 
tions, we get equal products of primes, 


Py P2P3°°° Pp = O93 °° ° Ip 
where no prime p,; equals any prime q;. This leads to a contradiction as 
follows. 
Since p, is a factor of the left-hand side, p, also divides the right-hand 
side. But then, by repeatedly using the prime divisor property, we get 


Pp, divides ¢,40493°°°, 
= p, divides g, or p, divides g,q,---g, 
=» p, divides g, or p, divides q, or p, divides q,---q, 


=> p, divides g, or p, dividesg, ... or p, divides q, 
which contradicts the assumption that no p; equals any g,;. Thus no natural 


number has two different prime factorizations. 0 


Although the prime divisor property was proved by Euclid (around 300 
BCE), unique prime factorization was mentioned for the first time by Gauss 
in 1801. 
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Exercises 


Gauss proved unique prime factorization by a novel proof of the prime divisor 
property that goes as follows. 


2.4.1 First show that a prime p cannot divide a product a,b, for natural numbers 
a,,b, < p. Namely, suppose that p divides a,b,, and show that p also 
divides a,b,, where 


b, = remainder when p is divided by b,, 


which gives an infinite descent. 


2.4.2 Now use Exercise 2.4.1 to deduce the prime divisor property by showing 
that if p divides ab, and p divides neither a nor b, then p divides an a,b, 
where a,,b, < p. 


2.5 Consequences of unique prime factorization 


_ 2m, mM m . . 
Ifc= Py' P32 PL where p,,P>,---,P, are primes and m,,m,,...,m, 
are natural numbers, then 


Thus in the prime factorization of a square natural number each prime 
occurs to an even power. And conversely, if 
d= py py” a pk 

then d = c*. Thus in fact a natural number is a square if and only if each 
prime in the prime factorization of d occurs to an even power. 

Now suppose that d is a square, and that d = ab, where a and b have no 
common prime divisor (or, as we said in Section 2.1, a and 0 are relatively 
prime). Then we have a prime factorization of the form 


d-ab-— py py . pink, 
Since a and b have no common prime divisor, each term p-”' must be part 
of the prime factorization of one of a and b and completely absent from 
the other. In other words, in the prime factorizations of a and b each prime 
occurs to an even power, and hence a and b are both squares by the remark 
in the previous paragraph. 
To sum up, we have the following proposition. 
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Relatively prime factors of a square. /f a and b are relatively prime 
natural numbers whose product is a square, then a and b are squares. 


Using the fact that 


oO _ py" p™ _— po, 
there are similar proofs that a natural number is a cube if and only each 
prime in its prime factorization occurs to a power that is a multiple of 3, 
and that if a and b are relatively prime natural numbers whose product is 
a cube, then a and b are cubes. 
Another important consequence of the prime factorization of a square 
is the existence of irrational square roots. 


Irrational square roots. /f N is a nonsquare natural number, then VN is 
irrational. 


Proof. Suppose that N is a natural number and that \/N is rational, that is, 
VN =a/b for some natural numbers a and b. 
We then have to show that N is a square. Squaring both sides, we get 


2 [ye — 2M, 2M ©, 2M 
N=a [bo = py py? pe 
for some primes p,, P>,.-.-, p,. Each prime occurs to an even power, namely, 
twice ifs power in a minus twice its power in b. But then N is a square, 
as required, by the argument above (which also applies when some m, are 
negative). 0 


Prime factorization, gcd, and lcm 


Unique prime factorization implies that each prime divisor of a natural 
number 7 actually appears in the prime factorization of n. And any common 
prime divisor of a and b will appear in both their prime factorizations. 
Hence the greatest common divisor of a and b is the product of the common 
primes in their prime factorizations. 


Kxamples. 


666 = 2 x 3° x 37 
1000 = 2° x 5°, 
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hence gcd(666, 1000) = 2. 


4444 — 27 «11x 101 
9090 = 2 x 3* x 5 x 101, 


hence gcd(4444, 9090) = 2 x 101 = 202. 


This method is quite effective for numbers that are small enough to fac- 
torize into primes. However, it is completely outclassed by the Euclidean 
algorithm for larger numbers. Also, it should be borne in mind that the fac- 
torization method is justified by unique prime factorization, which depends 
on the theory of the Euclidean algorithm. 


Prime factorizations also give the least common multiple (lcm) of two 
natural numbers. Any common multiple of a and b must be a multiple of 
each prime power in a and b, hence the least common multiple of a and 
b is the product of the maximum prime powers occurring in their prime 
factorizations. 


Examples. Using the factorizations of 666, 1000 and 4444, 9090 given 
above, we find 


lem(666, 1000) = 2° x 3” x 5° x 37 = 333000, 


and 
icm(4444,9090) = 27 x 3° x 5x 11 x 101 = 199980. 


Exercises 


As mentioned in Section 1.3, the concept of prime number is more complicated in 
Z than in N, because the unit —1 can be part of a factorization. This complicates 
the situation with squares and cubes in Z, but only slightly. 


2.5.1 If a and b are relatively prime integers whose product is a square, show by 
means of an example that a and b are not necessarily squares. If they are 
not squares, what are they? 


2.5.2 On the other hand, if a and b are relatively prime integers whose product is 
a cube, then a and b are cubes. Why’? 


The Euclidean algorithm shows immediately that gcd(2000, 2001) = 1. Still 
it is interesting to actually see that 2000 and 2001 have no common prime divisor. 


2.5.3 Find the prime factorizations of 2000 and 2001, thereby confirming that 
gcd(2000, 2001) = 1. 
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It is also useful to have formulas for gcd(a,b) and Icm(a,b) in terms of the 
prime factorizations of a and b. 


2.5.4 Suppose that p,,p>,---,p, are all the primes that divide a or b, and that 


= my } My my, 
— Hy Ne Phy 
b= pp > p.. 
Deduce that 
ecd(a, b) _ prin a) pimin(ma na) _ printing) 
| — pmax(my,n,) ,max(m, Ny) || | ,MAx(m, MN, ) 
lem(a,b) = p, Py Py , 


2.5.5 Deduce from Exercise 2.5.4 that gcd(a,b)lcm(a,b) = ab. 


Now that we know uniqueness of prime factorization, we can revisit Euclid’s 
theorem about perfect numbers, mentioned in the exercises to Section 1.5. 


than it) are 1,2,27,...2?-! and g,2q,27g,...,2?-*¢. 


2.5.7 Show that 1+2+2%+.-. “++ op-2 — 9P-! _ 1, and deduce that the sum of 
the proper divisors of 2?~'g is 2?~'g. (That is, 2?~'g is perfect.) 


2.6 Linear Diophantine equations 


The simplest nontrivial Diophantine equations are linear equations in two 
variables, 
ax-+by=c, where a,b,c © Z. 


Such an equation may have infinitely many solutions or none. For example, 
the equation 
6x+ 15y =0 


has the infinitely many solutions x = 15r, y = —6¢ as t runs through the 
integers. On the other hand, the equation 


6x+15y=1 


has no integer solutions. This is so because 3 divides 6x-+ I5y when x and 
y are integers (since 3 divides both 6 and 15) but 3 does not divide 1. This 
example shows that common divisors are involved in linear Diophantine 
equations, and exposes the key to their solution: the linear representation 
of the gcd found in Section 2.3. 
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Criterion for solvability of linear Diophantine equations. When a, b, c 
are integers, the equation ax + by = c has an integer solution if and only if 
ecd(a, b) divides c. 


Proof. Since gcd(a, b) divides a and b, it divides ax + by for any integers x 
and y. Therefore, if ax + by = c, then gcd(a,b) divides c. 

Conversely, we know from Section 2.3 that gcd(a,b) = am + bn for 
some integers m and n. Hence if gcd(a, b) divides c we have 


c= gcd(a,b)d = (am+bn)d=amd+bnd_ forsomed € Z. 


But then x = md, y = nd is a solution of ax+ by =c. 0 


This proof also shows how to find a solution ax + by = c if one ex- 
ists. Namely, express gcd(a,b) in the form am-+ bn, using the symbolic 
Euclidean algorithm to find m and n, then multiply m and n by the integer 
d such that c = gcd(a,b)d. 

If there is one solution x = x, and y = yp, then there are infinitely many, 
because we can add to the pair (x), y,) any of the infinitely many solutions 
of ax+ by = 0. 


General solution of ax+by = c. The solution of ax+by = c in @ is 
x =X, +bt/gcd(a,b), y = yy — at/ gcd(a,b), where x =X, y = Yo is any 
particular solution and t runs through Z. 


Proof. Since x = bt/gcd(a,b), y = —at/gcd(a,b) is clearly an integer 
solution of ax + by = 0, adding it to any solution x = Xp), y = yp of ax+ by = 
c gives another solution of ax+by=c. 

Conversely, if x, y is any solution of ax+by =, then x’ = x— Xo; 
y' = y—Yy Satisfies ax’ + by’ = 0. But any integer solution of ax’ + by’ = 0 
is a solution of the equation 


ff ff 
ax =—b'y 


whose coefficients are the relatively prime integers a’ = a/ gcd(a,b) and 
b' = b/ gced(a,b). 

Since a’ and b’ have no common prime divisor, it follows from the 
unique prime factorization of both sides of the equation a’x’ = —b’y’ that 
b' divides x’. That is 


/ 


x =b't forsome integert, and hence y' = —a’t. 
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Substituting the values of x’, y’,a’,b’ back in the equations above yields 
x=X)+bt/gced(a,b), y=y,—at/gced(a,b), 


as claimed. O 


Exercises 


The criterion for solvability can also be derived directly, by proving the following 
result without appeal to the Euclidean algorithm. 


2.6.1 Show that {am+bn:m,n © Z} consists of all integer multiples of gcd(a, b). 


However, the Euclidean algorithm is invaluable for actually finding solutions 
to linear Diophantine equations. 


2.6.2 Find an integer solution of 34x-+ 19y = 1. 
2.6.3 Also find an integer solution of 34x + 19y = 7. 


2.6.4 Is there an integer solution of 34x -+ 17y = 1? 


2.7 “The vector Euclidean algorithm 


In Section 2.3 we used an extension of the Euclidean algorithm to compute 
the gcd of integers a and b in the form 


ecd(a,b) =ma+nb_ for some m,n € Z. 


The extension runs the ordinary algorithm (“subtracting the smaller num- 
ber from the larger’) and uses it to guide a symbolic imitation that performs 
the same operations on linear combinations of the letters a and b. 

We now wish to analyze the symbolic part of the algorithm more closely 
in the case where a and DB are relatively prime. To do so we replace each 
linear combination m,a+n,b by the ordered pair, or vector, (m;,N;). To 
enable the ordinary algorithm to run as simply as possible we take a > 0 
and b < 0 and keep the positive number in the first place and the negative 
in the second. 

Then each step of the ordinary Euclidean algorithm is actually an ad- 
dition: the number with the larger absolute value being replaced by its sum 
with the other number. The corresponding steps in the symbolic algorithm 
are vector additions, so we call the resulting process the vector Euclidean 
algorithm. 
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Example. Figure 2.1 shows the steps of the vector Euclidean algorithm on 
(12,—5), with number pairs in the first column, symbolic pairs in the sec- 
ond column, and vector pairs in the third. The actual additions are shown 
only in the symbolic column. 


Symbolic pairs 
_ (a,b) (1,0), (0,D) 
(a+b,b) ((1,1), 0,1) 

((a+b) +b,b) = (a+2b,b) ((1,2), (0,1)) 


(a+2b,b+ (a+2b)) = (a+2b,a+3b) (1,2), (1,3) 
(a+ 2b,a+3b+(a+2b)) = (a+2b,2a+ 5b) ((1,2), (2,5)) 
(a+ 2b+ (2a+ 5b),2a+5b) = (3a+7b,2a+5b) | (3,7), (2,5)) 


Figure 2.1: Outputs of Euclidean algorithms 


From the bottom line we read off (as in Section 2.3) that 
1=3a+7b=3x12—7x5 


so (m,n) = (3,7) is a natural number vector such that 12m —5n = 1. 

It is also interesting to run the algorithm one step further (adding the 
number 1 to the number —1 in the first column to get 0), because 12 and 5 
then reappear in the vector column. 


(1,0) | (3a+7b,2a+5b+ (3a+7b)) = (3a+7b,5a+12b) | ((3,7), (5,12)) 


Figure 2.2: Result of the extra step 


This is not surprising because 0 = 5 x 12 — 12 x 5, though conceivably 
we could have obtained a larger multiple of the vector (5,12). What is 
interesting is how easily we arrive at the vector (5,12): namely, we started 
with the vectors i= (1,0) andj = (0,1), and took a series of steps in which 
a vector pair (V,,V>) was replaced by either (V, +¥>,V>) or (¥,,¥, +>). 

We now generalize this example to show: 


Relative primality in the vector Euclidean algorithm. /n running the 
vector Euclidean algorithm: 
1. Every vector produced from (1,0) and (0,1) is a relatively prime 
pair of natural numbers. (We call such a vector primitive.) 


2. Every relatively prime pair (a,b) of natural numbers can be pro- 
duced (by starting the ordinary Euclidean algorithm on b and —a). 
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Proof. 1. It is clear that any vector produced is a pair of natural num- 
bers, because the first new pair is (1,1) and further vector additions cannot 
decrease the members of the pair. 

To see why each pair produced is relatively prime we prove a stronger 
property: if ((m,,n,), (m,,N,)) is the vector pair at any step, then 


MN, —NyM, = 1. 


This is true at the first step, when (m,,n,) = (1,0) and (m,,n,) = (0,1). 
And if it is true for the vector pair ((7,,7, ), (#15, ,)) then it is also true for 
the next pair ((m, +m5,n, +n), (m,n,)) or ((m,,n,), (m, +m,,n, +N, )). 
This is so because 


(m, + My )Ny — (ny +Ny)m, = myn, — nym, = | 


and 
m,(n,+n,)—n,(m, +m) = myn, —nym, = I. 

It follows that each vector (m,,n,) produced is a relatively prime pair, 
because any common divisor of m, and n, also divides m,n, —n,m, = 1. 
Similarly for each vector (m,,n,). 

2. If a and b are relatively prime natural numbers then the vector Eu- 
clidean algorithm, guided by the ordinary Euclidean algorithm on b and 
—d, produces a vector (m,n) such that mb — na = 0, and m and n are rela- 
tively prime by part 1. 

Since prime factorization is unique, mb = na for relatively prime a, b 
and relatively prime m, n implies m = a and n = b. Hence any relatively 
prime pair (a,b) can be produced by the vector Euclidean algorithm. 0 


Exercises 


The proof of relative primality in the vector Euclidean algorithm applies whether 
or not the guiding numbers b and —a are relatively prime. 


2.7.1 If b and —a are not relatively prime, which vector (m,n) such that mb = na 
is produced by the vector Euclidean algorithm? 


AS we saw in Section 2.6, the symbolic Euclidean algorithm is used when 
solving linear Diophantine equations. The above analysis of the vector algorithm 
directly shows its connection with certain equations. Suppose that we run the or- 
dinary Euclidean algorithm on the numbers b and —a until 1 and — 1 are produced, 
and suppose that the corresponding vector pair is ((m,,n,), (#15,N5)). 


2.7.2 Show that (x,y) = (m,,n,) is the least positive solution of bx — ay = 1 and 
that (x,y) = (m,,n,) is the least positive solution of bx — ay = —1. 
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2.8 “The map of relatively prime pairs 


The results of the previous section are presented graphically by Figure 2.3, 
which we call the map of relatively prime pairs or primitive vectors. It 1s 
a partition of the plane by an infinite tree into regions labelled by ordered 
integer pairs (a,b). The top two regions are labelled (1,0) and (0,1), and 
the other labels are generated by vector addition: if regions labelled v, 
and v, share an edge, then the region below the bottom end of the edge is 
labelled v, + V5. 


Figure 2.3: Regions labelled by relatively prime pairs 


From the portion of the map shown in Figure 2.3, 1t appears that all 
labels are distinct and each of them (1,0) and (0,1) is a relatively prime 
pair of natural numbers. This can be proved by relating the map to the 
vector Euclidean algorithm: the map is in fact a panoramic view of all 
outcomes of the algorithm, in the sense that each sequence of vector pairs 
produced by a run of the algorithm occurs as the sequence of pairs of labels 
flanking the edges (to left and right) in a finite path down the tree. This is 
so because both are governed by the vector addition rule. 

Thus, the sequence ((1,0),(0,1)),((1, 1), (0,1)),...,((3,7), (2,5)) in 


the example of Section 2.7 is the sequence of left/right label pairs for the 
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path shown in Figure 2.4. 


ao) | on) 


Figure 2.4: The branch leading to (5, 12) 


Conversely, any path down the tree, starting with the edge between 
(1,0) and (0, 1) and ending at the top of region (a, b), is flanked by left/right 
pairs of labels that are precisely the pairs produced by the vector Euclidean 
algorithm with input numbers b and —a. This is so because the paths in a 
tree are unique, hence the path to the top vertex of region (a,b) must be the 
one corresponding to the vector Euclidean algorithm running on b and —a. 

This correspondence between paths and runs of the vector Euclidean 
algorithm allows us to deduce the basic properties of the map from proper- 
ties of the algorithm proved in the previous section. 

1. Each region of the map, except those labelled (1,0) and (0,1), is 
labelled by a relatively prime pair of natural numbers. This follows from 
Property | of the vector Euclidean algorithm. 

2. Each relatively prime pair (a,b) of natural numbers occurs as a 
label. This follows from Property 2 of the vector Euclidean algorithm. 

3. Each label occurs only once. This is so because we reach the label 
(a,b) by running the ordinary Euclidean algorithm on b and —a, and the 
run determines a unique path in the tree. 


Exercises 


The map of relatively prime pairs has been discovered and rediscovered several 
times in the history of mathematics without ever becoming known well enough to 
acquire an official name. Perhaps its best known role is in representing rational 
numbers, since there is a one-to-one correspondence between positive rational 
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numbers and reduced fractions a/b, which correspond in turn to relatively prime 
pairs (a,b) of natural numbers. This idea is known under the name of Farey 
fractions, and accounts of it may be found in Conway (1997), Rademacher (1983), 
and Hardy and Wright (1979). 

The connection between reduced fractions and regions goes deeper than the 
obvious correspondence a/b + (a,b); it also preserves order. That is, the order- 
ing of fractions from large to small corresponds to the ordering of regions from 
left to right. 


(m,,n,) and (m,,n,) meet along an edge, with region (m,,n,) on the left, 
then m,/n, > m,/ny. 


2.8.2 Deduce that, if region (m,,n,) is anywhere to the left of (m,,n,), then 
m, [Ny > My, /N. 


2.8.3 Use Exercise 2.8.2 to give another proof that each label (a,b) occurs only 
once. 


The tree structure of the Farey fractions is known as the Stern-Brocot tree. It 
is obtainable from our map by moving each label other than (1,0) and (0,1) to 
the vertex above it in Figure 2.3. More on the Stern-Brocot tree may be found in 
Graham ef al. (1994). 

We have taken our form of the map from Conway (1997), who uses it to give 
a very simple and graphic way of studying guadratic forms. For this purpose, the 
map has an advantage over the tree because it admits a natural extension to a map 
with regions labelled by all pairs of relatively prime integers. We take up this idea 
in Chapter 5. 
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Discussion 


The results of this chapter are our answer to the question posed in Chapter 
1 in connection with Z and Z|i]: what does it mean to “behave like” the 
integers? Roughly speaking, the operations + and x should make sense 
and have the ring properties, there should be primes, and there should be 
unique prime factorization (or equivalently, the prime divisor property). 
The importance of unique prime factorization was first recognized by 
Gauss (1801) although, as mentioned in Section 2.4, the equivalent prime 
divisor property was known to Euclid. Another remarkable equivalent of 
unique prime factorization was discovered by Euler (1748a). It is his prod- 
uct formula for what is now called the zeta function ¢(s), defined by the 
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following two expressions: 


i 
ar :) (*) 
—?p 


It is not obvious that these expressions are equal, and indeed their equality 
is equivalent to unique prime factorization! If we expand each factor on 
the right in a geometric series 


1 
—= lt p%+p %+p +e, 
l1—p° 
then the product of all the factors will be the sum of | together with every 
possible term of the form 


WS TMS |, HHS __ 


Pi ' Py, Py | TS? 
(pi et m 


where p,, P>,---, P, are distinct primes and m,,m,,...,m, are natural num- 
bers. The terms py Py? va pi include each natural number n exactly once, 
just in case unigue prime factorization holds, in which case we get the for- 
mula (*). 

What makes the product formula (*) even more amazing is that it also 
implies the infinitude of primes, thus unifying the two most important theo- 
rems about primes. Euler’s proof of infinitude uses the special value s = 1. 
If there are only finitely many primes, then the right-hand side of (*) is 
finite for s = 1, whereas the left-hand side is 1 + 4 + 4 + 5 +--+, which is 
well known to be infinite. Thus we have a contradiction, so there must be 
infinitely many primes. 

The Euclidean algorithm was historically decisive for unique prime 
factorization, establishing this property for Z, Z|i] and several other rings 
we meet later. Even before unique prime factorization was noticed, the 
algorithm was used by ancient Indian and Chinese mathematicians to solve 
linear Diophantine equations. Such equations arise in “calendar problems’, 
where one has, say, a year of 3654 days and a lunar cycle of 29% days and 
one wants to know such things as the next time that there will be a new 
moon on the first day of the year. 

The modern history of the Euclidean algorithm begins with the discov- 
ery of Gauss (1832) that it also applies to Z[i|. Dirichlet made the algorithm 
the basis of his Vorlesungen of 1863, using it to derive the basic results 
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about Z in much the same way as we have here. The Vorlesungen went 
through four editions, evolving after Dirichlet’s death through the editorial 
work of Dedekind, who began to enlarge it with Supplements from 1871 
onwards. In the successive versions of Supplements X and XI Dedekind 
gradually freed number theory from dependence on the Euclidean algo- 
rithm by developing ideal theory, a development we take up in the last few 
chapters of this book. 


PREVIEW 
Many questions in arithmetic reduce to questions about remainders 
that can be answered in a systematic manner. For each integer n > | 
there 1s an arithmetic “mod n” that mirrors ordinary arithmetic but 1s 
finite, since it involves only the n remainders 0,1,2,...,2— 1 occur- 
ring on division by n. Arithmetic mod n, or congruence arithmetic, 
is the subject of this chapter. 


We motivate congruence arithmetic with some arithmetic folklore: 
the test for divisibility by 9 by “casting out nines”. This is explained 
by the arithmetic of +, —, and x mod 9, and it leads naturally to 
+, —, and x mod n, and to the problem of division mod n. It turns 
out that division (by nonzero numbers) is possible mod n when n is 
prime, but not generally. 


Division by a nonzero number a, mod n reduces to the problem of 
finding an inverse of a, mod n, that is, finding a Db such that ab leaves 
remainder | on division by n. This turns out to be a simple spinoff 
of the procedure used in Chapter 2 to find integers m and n such that 
ma -+-nb = gcd(a,b), using the Euclidean algorithm. 


Related to the subtleties of division are the classical theorems of 
Fermat, Euler, and Wilson, which are important throughout num- 
ber theory and its applications. The most famous application, the 
RSA cryptosystem, is discussed in the next chapter, but the present 
chapter paves the way for it. 


We also pave the way for studying guadratic forms ax? + bxy+cy* 
by using congruence arithmetic to show that certain values are im- 
possible for the forms x7 + y*, x* + 2y’, and x7 + 3y°. 
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3.1 Congruence mod n 


Casting out nines 


An old rule to test whether a natural number is divisible by 9 is to see 
whether the sum of its digits 1s divisible by 9. For example, 774 1s divisible 
by 9 because 

7+7+4= 18, 


which is divisible by 9. 

This rule, called casting out nines, not only decides divisibility but in 
fact gives the remainder on division by 9. For example, if we add the digits 
of 476 we get 

4+-7+6= 17, 


which leaves remainder 8 on division by 9. This is also the remainder when 
476 is divided by 9. 
Now of course 476 does not stand for 4+ 7+ 6 but for 


4x 10° +7 10+6. 


Yet somehow, as far as remainders are concerned, 4+ 7+ 6 behaves like 
4x 10°+7x 10+6. 
To explain how this happens, we introduce the concept of congruence. 


Definition. Integers a and 5 are said to be congruent mod n, written 
a=b (modn), 


if they leave the same remainder on division by n. Equivalently, a is con- 
gruent to b, mod n, if n divides a — b. 


We also say that a and b belong to the same congruence class, mod n. 

Congruence mod 2 is the most familiar type of congruence in daily life, 
where we have words for numbers congruent to 0 (the even numbers), the 
numbers congruent to 1 (the odd numbers), and for numbers in the same 
congruence class (they have the same parity). 

Congruence mod 2 is easy to recognize in decimal notation, as is con- 
gruence mod 5 and 10. We can tell immediately, for example, that 1244788 
is even, 1244785 is divisible by 5, and 1244780 is divisible by 10. This is 
because numbers are congruent mod 2, 5, or 10 if their last digits are con- 
eruent mod 2, 5, or 10 respectively. 
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Likewise, we can tell whether numbers are congruent mod 4 by looking 
at their last two digits, and similar results apply to congruence mod other 
products of 2 and 5. 

Congruence mod 9, the concept relevant to casting out nines, is not so 
easy to understand. For this we need congruence arithmetic. 


Exercises 


The rules given above for recognizing congruence mod 2, 5, and 10 are easy to 
explain and generalize. 


3.1.1 Explain why the remainder of any natural number on division by 2 is the 
same as the remainder of its last digit. 


3.1.2 Why does the same apply to division by 5 and 10, but not to division by 4? 


3.1.3 Show that the remainder of n on division by 4 is the same as the remainder 
of the number given by the last two digits of n. 


3.1.4 How many digits determine the remainder on division by 8; by 16? 


3.2 Congruence classes and their arithmetic 


The integers that leave remainder a on division by n form what is called 
the congruence class of a, 


{nk+a:kE Zh, 


which we denote by the natural notation nZ+ a (or just nZ when a = Q). 
For example 


2Z = {even numbers}, 
2Z,+ 1 = {odd numbers}. 
Each congruence class is a set of equally spaced points along the number 
line. For example, the classes 3Z, 3Z+ 1 and 3Z-+ 2 look like the white, 
erey and black points respectively in Figure 3.1. 


@® o 60660 e060 60 60 6 0 


Figure 3.1: The congruence classes mod 3 
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Such pictures suggest that if we add any point innZ-+ a to any point in 
nZ.+b we get a point innZ+ (a+b). We can also see this algebraically: 
any point innZ-+a is of the form nk +a and any point innZ-+ bis of the 
form nl +b, so their sum n(k+/)+ (a+b) is innZ+ (a+b). 

Therefore, it is meaningful to define the sum of congruence classes by 


(nZ+a)+(nZ+b)=nZ+ (a+b), 


since we land in the class of a+b whichever elements we add from the 
class of a and the class of b. Similarly, it is meaningful to define the differ- 
ence of congruence classes by 


(nZ +a) — (nZ+b) =nZ+(a—b). 
Finally we have a product of congruence classes defined by 
(nZ+a)(nZ+b) =nZ-+ab, 


although it is not so obvious that any element of nZ + a times any element 
of nZ+ b will be an element of nZ+ ab. To see why, take any member 
nk +a from nZ+a and any member ni + 5 from nZ+ b. Their product is 


(nk +a)(nl +b) = n°kl +nkb+nla+ab 
= n(nkl +kb+la)+ab, 


which is indeed a member of nZ + ab. 
Another way to handle addition of congruence classes is by “addition 
of congruences’. If we have the congruences 


a; =a, (modn) (1) 


and 
b,=b, (modn) (2) 


then (1) says a, and a, are in the same congruence class, call itnZ-+a, and 
(2) says b, and b, are in the same congruence class, call itnZ + b. Then it 
follows that the sums a, + b, and a, +b, belong to the same congruence 
class nZ+ (a+b), hence 


a,+b,=a,+b, (modn) (3) 


Congruence (3) is the result of “adding” congruences (1) and (2). 


3.2 Congruence classes and their arithmetic 47 


Similarly, we can show by subtraction and multiplication of congru- 
ence classes that (1) and (2) imply 


a,—b, =a,—b, (modn) (4) 
(“subtraction of congruences”) and 
a,b, =ad,b, (mod n) (5) 


(“multiplication of congruences”). 


Remark. The system of congruence classes mod n, under the operations 
of + and x, is denoted by Z/nZ. This agrees with the quotient notation for 
groups (see Elements of Algebra, Section 7.8), since nZ is a subgroup of 
#, and the congruence classes nZ +a are the cosets of nZ in Z. However, 
in this book, Z/nZ has the additional structure given by the x operation. 


Casting out nines again 


Using arithmetic mod 9 we can now explain the method of casting out 
nines introduced in Section 3.1. 
First note that 
10=1 = (mod 9), 


and therefore 
107° =1°-=1 (mod 9), 
10° =1=1 (mod9), 


and so on, by multiplication of congruences. 
For any integer a, it follows, by multiplication of congruences, that 


a,10! =a, (mod9), 
and finally, by addition of congruences, that 
a,10*++-++4a,10+a)=a,+---+4a,+4) (mod 9). (*) 


But if dp,a,,...,a, are between 0 and 9 (that is, they are all decimal 
“digits’), then a,10* ++++-+-a,10-+ dp is the number whose decimal nu- 
meral is d,-+-a, dp. 

Thus (*) says that, on division by 9, a, ---a,a, leaves the same remain- 
der as the sum a, +--+ a, +p, as required for casting out nines. 
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Exercises 

There 1s an identical rule (which could be called “casting out threes”) for testing 
divisibility by 3, and a very similar rule for testing divisibility by I1. 

3.2.1 Show that the above argument applies to show that 


and hence that a number is divisible by 3 if and only if the sum of its digits 
is divisible by 3. 


3.2.2 Use 10 = —1 (mod 11) to find what 10%, 10°, ... are congruent to, mod 11. 


3.2.3 Deduce from Exercise 3.2.2, using multiplication and addition of congru- 
ences, that a,d,_,°+++d,dp 1s divisible by 11 if and only if the “alternating 
sum” of its digits, (—1)*a, +--+ +a, —a, +p, is divisible by 11. 


3.3. Inverses m 


od P 


In Z, the equation ab = | has only two solutions: a,b = | anda,b = —1. 
Another way to put this is that 1 and —1 are the only integers with multi- 
plicative inverses. 

The situation is more interesting mod p, for prime p. In this case, if 
a #0 (mod p) then there is a number b such that 


ab=1 (mod p). 
We say that each a = 0 (mod p) has a multiplicative inverse, mod p. 
Example. p = 5 
l has inverse 1, 2hasinverse 3, 3 has inverse 2, 4 has inverse 4. 


The condition a 4 0 (mod p) means that p does not divide a. Since p 
is prime, it follows that gcd(a, p) = 1. By Section 2.3, this implies that 


ma+tnp=1 
for some m,n € Z. In other words, 
ma=1 (mod p), 
so mis the required inverse of a, mod p. C 


Thus we can find the inverse m of a from the calculation (based on the 
Euclidean algorithm) that finds the m and n such that gcd(a,b) = ma-+nb. 
It follows that the computation of an inverse mod p is fast—it takes about 
n steps for an n digit prime p. 


3.3. Inverses mod p 49 


Groups 


The existence of inverses for all the nonzero congruence classes mod p 
implies that these congruence classes form a group, a concept briefly men- 
tioned in Section 1.3 that we now review. 

A group is a set G together with an operation on it, the group oper- 
ation, with the associative, identity, and inverse properties. If the group 
operation is written as multiplication, then the identity element is written 
1, the inverse of g € G is written g~!, and the three properties are: 


2, (8 g3) = (g,8)) £3 (Associativity) 
gl=le=¢ (Identity property) 
eg t!=glg=] (Inverse property) 


Now we can formally confirm that the nonzero congruence classes mod 
p form a group under multiplication. We call this group (Z/pZ)”*. 


Group properties of (Z/pZ)*. For a prime p, the nonzero congruence 
classes mod p form a group under multiplication. 


Proof. First note that multiplication of congruence classes “inherits” asso- 
ciativity from the associativity of multiplication in Z as follows: 


class of a x (class of b x class of c) 
= class of a(bc) by definition 
= class of (ab)c since a(bc) = (ab)c by associativity in Z 


= (class of a x class of b) x class of c _ by definition. 


It follows that the product of nonzero congruence classes, mod p, is 
again a nonzero class. If ab = 0 (mod p) and we multiply both sides by 
the inverse c of b we get (ab)c = 0 x c =0 (mod p) by multiplication of 
congruences. Hence the right-hand side 0 is congruent to the left-hand side 


(ab)c=a(be) (mod p) by associativity 
=a(l) (mod p) since c is inverse to b 


=a (mod p). 


Thus the product is zero only when a factor is zero, hence the set of nonzero 
congruence classes is closed under multiplication, mod p. 
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We also have an identity element, namely the class of 1, and every 
element has an inverse by assumption. Thus (Z/pZ)” has all the defining 
properties of a group. O 


(Z./ pZ)* has the additional property that characterizes abelian groups: 
8182 = 8281 (Commutativity) 


Most of the groups in this book are abelian, but the first theorem that we 
use—Lagrange’s theorem—is easily proved in full generality. The proof is 
based on the concepts of subgroup and cosets. 

A subset H of G that forms a group under the group operation in G is 
called a subgroup of G, and the Cleft) cosets of H in G are the sets of the 
form 

eH ={gh:heH}, 
for all g ¢ G. Different g,,2, € G do not necessarily produce different 
cosets g,H, g,H. For example, h)H = H for any hy € H, because each 
hyh © H when h € H, and conversely each h, € His of the form fh for 
some h € H, namely h = hy ‘hy. 

In fact, the proof we are about to give shows that the number of cosets 
gH for asubgroup H of a finite group G is precisely |G|/|H|, where |G| and 
|| denote the “size” (that is, number of elements) of G and H respectively. 


Lagrange’s theorem. /[f H is a subgroup of a finite group G, then |H| 

divides |G|. 

Proof. First observe that each coset gH has the same size as H; the map- 

ping from H to gH that sends h to gh can be reversed by multiplying on 

the left by g~!. Thus all cosets have the same number of elements. 
Second, we observe that any two cosets with a common element are 

identical. If g € g,H and g € gH then 


g=g,h, forsomeh,cH, g=g,h, forsomeh, €H, 
and therefore g,h, = g,h,. Multiplying this on the right by het, we find 
that g, = gy hyhy', and therefore 
gH = gyhyhy'H = By (high; 'H). 


But hyhy' € H and so, by the example preceding the proof, hyh,'H = Hf, 
Hence g,H = gH as claimed. 

These two observations together show that the |G| elements of G fall 
into disjoint cosets gH of equal size |H|. Hence |H| divides |G]. O 
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Exercises 


In the next section we use Lagrange’s theorem to prove a famous theorem about 
congruence mod p. For readers not yet comfortable with group theory, the fol- 
lowing exercises pave the way for a more direct proof using a minimum of infor- 
mation about inverses. Their content is a special case of the example preceding 
the proof of Lagrange’s theorem—that multiplying a group by one of its elements 
reproduces the same set of elements. 

Suppose that a 4 O (mod p), that is, a is not a multiple of p. Thus a has an 
inverse, mod p. Use it! 


3.3.1 Show that ia = 0 (mod p) = i = 0 (mod p). 
3.3.2 Show that ia = ja (mod p) = i= j (mod p). 
3.3.3 Deduce from Exercises 3.3.1 and 3.3.2 that a,2a,3a,...,(p — 1)a are dis- 


{a,2a,3a,...,(p—l)a} = {1,2,3,...,p—1} (mod p). 


3.3.4 Verify the result of Exercise 3.3.3 in the case p = 7, a = 2. 


3.4 Fermat’s little theorem 


If we form powers a,a’,a°,a*,... of any nonzero element a, mod p, then 
eventually there will be a repeated value, say 


m+n Mm 


a =a” (mod p). 
Multiplying both sides by the inverse of a’”, mod p, then gives 
a’ =1 (mod p). 


Thus in fact the series of powers always includes 1. For example, if we 
take p = 5 and a = 2 and compute 2,27,2°,2*... mod 5 we find that 2+ = 
16 = 1 (mod 5). The sequence of powers repeats the same finite sequence 
a,a’,a’,a*,...,a"~',1 forever and is therefore called cyclic. 

From the eroup-theoretic point of view, the argument just given shows 
that the powers of a nonzero element, mod p, form a subgroup of the group 
(Z/pZ)*. (Associativity and the identity element are obvious and the in- 
verse of aX is a”~*.) Lagrange’s theorem can then be applied, and it says 
how the size of the subgroup, and hence the least exponent n for which 
a’ = 1 (mod p), is related to p. 
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Fermat’s little theorem. [f p is prime and a = 0 (mod p), then 
a?-'=1 (mod p). 


Proof. (Z/pZ)* has p— 1 members, the classes of 1, 2,3,...,p—1, so the 
size of any subgroup of (Z/pZ)~* divides p— 1 by Lagrange’s theorem. 
In particular, if a 4 0 (mod p) and n > | is the least exponent for which 
a" = 1 (mod p), then the powers of the class of a form a subgroup with n 
members, and hence n divides p — 1. 
But if 
a" =1 (mod p) 


and n divides p — 1 (say, p— 1 = mn) then 


Qi l=qM@= (a")" =1"=1 = (mod p) O 


Application: a formula for the inverse mod p 


It follows from Fermat’s little theorem that, for any a € 0 (mod p), 


a?-*--a=1 (mod p), 
hence a?~? is the inverse of a, mod p. This is not only an explicit formula 
for the inverse mod p, it also implies an efficient method to compute it, 
competitive with the Euclidean algorithm method of the previous section. 
We know from Section 1.5 that a?~? can be computed in about log p 
multiplications, and here the numbers to be multiplied are < p, since we are 
working mod p. Compare this with finding the inverse of a by the method 
of Section 3.3: using the Euclidean algorithm to express 1 = gcd(a, p) 
in the form ma-+ np, which gives m as the inverse of a, mod p. This 
involves about log p divisions with remainder (plus some other, less time- 
consuming, arithmetic), again on numbers < p. Since division takes about 
the same time as multiplication, the two methods are of similar speed. 


Primitive roots 


The minimum positive integer n such that a” = 1 (mod p) ts called the 
order of ain (Z/pZ)*. The proof tells us that the order of any nonzero a, 
mod p, 1s a divisor of p— 1. There is always an a of order exactly p— 1, 
called a primitive root for p. Its existence was conjectured by Euler and 
first proved by Gauss (1801). Primitive roots do not play an important 


3.5 Congruence theorems of Wilson and Lagrange 53 


role in this book, though sometimes they throw light on results provable by 
other means. Thus their properties and proof of existence are not essential 
reading, but they are in the starred sections at the end of this chapter. 


Exercises 


We now complete the proof of Fermat’s little theorem begun in the previous exer- 
cise set. 


3.4.1 Deduce from Exercise 3.3.3 that 
a! x1 x2x3x---x(p—-l)=1x2x3x---x(p—1) (mod p). 
3.4.2, Exercise 3.4.1 implies that a@?~! = 1 (mod p). Why? 
Now for a few simple exercises on primitive roots. 


3.4.3 Show that 2 is a primitive root for 5 but not for 7. 


3.4.4 Find a primitive root for 7. 


3.4.5 Given the existence of primitive root for p show that every divisor of p— 1 
occurs as the order of some element of (Z/pZ)”. 
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ilson and Lagrange 


Another useful application of inverses mod p is the following theorem, 
which actually evaluates the product (p—1)!=1x2x3x---x(p-—1) 
used in some proofs of Fermat’s little theorem. It will be useful to know 
the value of (p — 1)! mod p in Section 9.8, when we come to the law of 
quadratic reciprocity. The theorem is credited to Wilson (and it may in fact 
have been discovered by Ibn al-Haytham in the 10th century), but the first 
known proof is due to Lagrange. 


Wilson’s theorem. [f p is prime then (p— 1)! = —1 (mod p). 


Proof. In this congruence the factors 1,2,3,...,p— 1 all have inverses 
mod p, hence each is cancelled by its own inverse except the factors that 
are inverse to themselves. 

Such self-inverse factors x are | and p— 1 = —1 (mod p), and no others, 
because if x7 = 1 (mod p) we have 


x?—1=(x—1)(x+1)=0 (mod p). 
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In other words, p divides (x —1)(x+1). But then p divides x—1 or p 
divides x-+ 1 by the prime divisor property, hence 


x=1(modp) or x=-—1 (modp), as claimed. 

Thus the product (p — 1)! is = —1 (mod p), as required. 0 

The fact that the congruence x* — 1 = 0 (mod p) has at most two solu- 
tions has an important generalization due to Lagrange. 
Lagrange’s polynomial congruence theorem. /f P(x) is a polynomial of 
degree n with integer coefficients, and p is prime, then the congruence 

P(x) =0 (mod p) 

has at most n incongruent solutions, mod p. 
Proof. If there is no solution, we are done. Otherwise, suppose P(r) = 0 


(mod p), where 


P(x) = a,x" +a,_,x""! +es:+ajx+d) and ady,a 544,49 € Z. 


n—1t0°° 


This implies 


P(x) = P(x)—P(r) (mod p) 


= a,(x"—r")+a,_ (7) rr!) 4+---+a,(x—-r) (mod p) 

= (x—r)Q(x) (mod p) (*) 
where Q(x) is the polynomial of degree n — 1 that remains when x — r has 
been extracted from each of x” — r”, x”~'! —r’~! |... , x—r using the identity 


aK — Xk = (x—r) (oT! ak er eee tak 2 4 7!), 
It follows from (*) and the prime divisor property that the congruence 
P(x) = 0 (mod p) implies 
x—r=0 (mod p) or O(x)=0 (mod p). 


Since Q(x) has degree n — 1, we can assume inductively that the congru- 
ence Q(x) = 0 (mod p) has at most n — | incongruent solutions. Then 


P(x) = (x—r)Q(x)=0 (mod p) 
has at most n incongruent solutions (namely x = r and the solutions of 
O(x) = 0 (mod p)), as required. O 


Two important uses of this theorem are to prove the existence of prim- 
itive roots for p (Section 3.9), and to prove Euler’s criterion for squares 
mod p (Section 9.3). 
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Exercises 


Wilson’s theorem actually gives a criterion for a natural number n to be prime. 


3.5.1 Ifn is not prime, show that n divides (n — 1)!, that is, (2 — 1)! = 0 (mod n). 


Unfortunately, the criterion has no practical value when n is large (say, 100 
digits) because in this case we have no feasible way to compute (m — 1)! mod n. 


3.6 Inverses mod k 


It is not always true that an a = 0 (mod k) has an inverse mod k. 
For example, 2 4 0 (mod 4) but 


2x2=4=0 (mod 4). 
Thus 2 has no inverse, for if it did we could multiply both sides of 
2x2=0 (mod 4) 


by the inverse of 2 and get the false result 2 = 0 (mod 4). 


Criterion for existence of an inverse, mod k. An integer a has an inverse 
mod k if and only if gcd(a,k) = 1. 


Proof. If gcd(a,k) = 1 then, by Section 2.3, 
gcd(a,k) =1=ma+nk forsome m,n € Z. 


This says that 
ma=1 (mod k&), 


SO 1S an inverse of a, mod k. 
Conversely, if 7 is an inverse of a, mod k, then 


ma=1 (modk). 


Hence 
ma+nk=1  forsome m,n € Z. 


This implies gcd(a,k) = 1, because any common divisor of a and k also 
divides ma -+ nk, which equals 1. O 
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If a, and a, have inverses m, and m, mod k, then a,a, has inverse 
m,m,. It follows that the elements with inverses mod k form a set closed 
under multiplication and hence a group, which is called (Z/kZ)*. (The 
group properties may be checked as they were for (Z/pZ)* in Section 
3.3.) 


Example. (Z/8Z)* 


1 has inverse 1, 3 has inverse 3, 5 has inverse 5= —3, 7 has inverse 7= —1, 
and it can be checked that these are the only invertible elements. Thus 
(Z,/8Z)” is an abelian group with four elements. It is not cyclic, because 
each of its elements has order < 2. 


The size of (Z/kZ)*, that is, the number of elements a among 
1,2,3,...,.k—-1 


such that gcd(a,k) = 1, is denoted by @(k) and is called the Euler phi 
function. For example, (8) = 4 because the four elements 1, 3, 5, 7 are 
the only natural numbers a < 8 for which gced(a,8) = 1. 

Certain properties of @ are known, for example 


© o(p') = p''(p—1) for p prime, 


e o(mn) = o(m)Q(n) if gcd(m,n) = 1. 
These make it easy to compute @(k) if the prime factorization of k is 
known, but otherwise it is difficult. 


If we apply Lagrange’s theorem to an element a of (Z/kZ)*, exactly 
as we did to an element a of (Z/pZ)* in Section 3.4, then we obtain the 
following. 


Kuler’s theorem. /[f a is invertible mod k then 
a?) =1 (mod k). 


Proof. We use the same argument as for Fermat’s little theorem, except 
that now we use the fact that the size of the group (Z/kZ)* is p(k). O 


Like Fermat’s little theorem does for 4 = p, Euler’s theorem gives a 
formula for the inverse of a, mod k, namely a?“)—!. The formula for gen- 
eral k is not quite so explicit because it involves the @ function. This blocks 
the computation of the inverse by exponentiation mod k because there is no 
efficient way known to compute @(k). In fact, the difficulty of computing 
(p(k) is important for the security of the famous RSA cryptosystem studied 
in the next chapter. 
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Exercises 


The formula @(p') = p'~!(p— 1) (and its special case when i = 1) may be shown 
as follows. 

3.6.1 Explain why @(p) = p— | when p is prime. 

3.6.2. Show that there are p'~! multiples of p among the numbers 1,2,3,..., p’. 
3.6.3 Deduce that p(p') = p'~'(p — 1) when p is prime. 


The formula @(mn) = @(m)@(n) when gcd(m,n) = | is proved in Section 
9.7. For the time being we consider just a simple case. 


3.6.4 Verify that p(15) = @(3)@(5). 


3.7 


Quadratic Diophantine equations 


The behavior of quadratic Diophantine equations is much more complex 
than that of the linear Diophantine equations discussed in the last chapter. 
However, congruences are a good tool for showing that certain equations 
do not have solutions of a certain form. 


Example 1. x* + y* = p has no solution for p of the form 4n + 3. 


This statement is equivalent to x* + y* 4 3 (mod 4), which we can prove 
by trying the finitely many values of x and y (mod 4). These are x,y = 0, 
1, 2, —1, for which we have x”, y* = 0, 1. 

It follows that x* + y* = 0, 1, 2 (mod 4), and so x7 + y* ¥ 3 (mod 4), as 
claimed. 0 


Example 2. x* + 2y” = p has no solution for p of the form 8n +5, 8n +7. 


This statement is equivalent to x° + 2y* 4 5,7 (mod 8), which we can 
prove by trying the finitely many values of x and y (mod 8). These are 
x,y = 0, 1,2, 3,4, —3, —2, —1, for which we have x*, y* = 0, 1, 4. 

It follows that x* + 2y* = 0, 1, 2, 3, 4, 6 (mod 8), so x* + 2y" # 5,7, 
(mod 8), as claimed. UJ 


Example 3. x* + 3y* = p has no solution for p of the form 3n +2. 


This statement is equivalent to x° + 3y* 4 2 (mod 3), which we can 
prove by trying the finitely many values of x and y (mod 3). These are 
x,y = 0, 1, —1, for which we have xy = 0, 1. 

It follows that x* + 3y” = x* = 0, 1 (mod 3), so x7 + 3y* ¥ 2 (mod 3), 
as Claimed. 0 
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These three results were first claimed by Fermat, though he credited 
them to his secret weapon, the “method of descent’, apparently overlook- 
ing the easy congruence proofs. Descent is much heavier artillery (we use 
it in Section 7.7 to prove that x° + y’ ¢ z° for natural numbers x, y and z) 
and Fermat used it appropriately to prove more difficult complements of 
the results just mentioned. For example, while x* + y* never takes a prime 
value of the form 4n + 3 (by the argument above), it takes every prime value 
of the form 4n + 1. 

Fermat became interested in primes of the form x7 + y*, x7 + 2y* and 
x* + 3y* (which is why we denoted the right-hand side of the equations 
above by p) after reading a remark of Diophantus (Arithmetica, Book U1, 
Problem 19): 


65 is naturally divided into two squares in two ways, namely into 
7’ + 4° and 8° + 1°, which is due to the fact that 65 is the product of 
13 and 5, each of which is the sum of two squares. 


Evidently Diophantus was aware of the formula 
(aj + bf) (a3 +3) = (ayay £byby)” + (bay Fayby)’, 


which shows that the product of sums of two squares is itself the sum of 
two squares (in two different ways, corresponding to the choice of sign on 
the right-hand side). 

Fermat saw what this implies: knowing which natural numbers are 
sums of two squares depends on knowing which primes are sums of two 
squares. The easy congruence argument in Example | shows that primes 
of the form 4n+ 3 are not sums of two squares; the hard part is to show 
that all primes of the form 4n +1 are sums of two squares. The theo- 
rem became something of a showcase for new methods in number theory, 
with Lagrange, Gauss and others using it to show off their innovations. In 
Chapter 6 we give a proof using the Gaussian integers, due to Dedekind. 

It is also true that the primes of the form x7 + 2y are precisely those 
of the forms 8n+ 1 and 8n +3 not ruled out by the congruence arguments 
above (of course, numbers of the form 8+ 2, 8n+ 4, or 81+ 6 are not 
primes because they are divisible by 2). And likewise, the primes of the 
form x* + 3y* are those of the form 3n-+ 1. We prove these results later by 
combining results from Chapter 7 and Chapter 9. 
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Exercises 


It is entertaining to test Fermat’s two square theorem on the first few primes of the 
form 4n + | and to investigate his corresponding theorems on primes of the form 
8n+1,8n+3, and 3n+1. 


3.7.1 Write down the first 10 primes of the form 4n + 1 and check that each of 
them is a sum of two squares. (The first is 5 = 27 + 17.) 


3.7.2 Is any of these the sum of squares in two different ways? 

3.7.3 Write down the first 10 primes of the form 81+ 1 or 8n+3 and check that 
each of them is of the form x* +2y*. (And see whether any of them is of 
this form in two different ways.) 

3.7.4 Write down the first 10 primes of the form 3n-+ 1 and check that each of 


them is of the form x? + 3y*. (And see whether any of them is of this form 
in two different ways.) 


An interesting and puzzling phenomenon in elementary arithmetic is the 
period in the decimal expansion of 1/n. For example, we know that 


1/3 = 0.3333--- 
1/7 = 0.142857 142857--- 
1/13 = 0.076923 076923 --- 


We say that the decimal of 1/3 has period length 1 because the |-digit pat- 
tern 3 repeats; 1/7 has period length 6 because the 6-digit pattern 142857 
repeats; and 1/13 likewise has period length 6 because the 6-digit pattern 
076923 repeats. It is clear from the ordinary school division process that 
a repetition must eventually occur, so periodicity in the decimal expansion 
of 1 /n is not surprising. But why is the maximum period length n — 1, and 
under what circumstances does this occur? 

The main part of the answer is that period length n — 1 occurs when 
10 has order n—1 in the group (Z/nZ)*, that is, when 10"~! is the least 
positive power of 10 that is = 1 (mod n). We also express this condition by 
saying that 10 is a primitive root for n. A closer study of (Z/nZ)”, using 
Euler’s theorem, then shows why n— | is the maximum possible period 
length. 


Example. 1/7 = 0.142857 142857... 
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If we multiply this equation by 10, 107, 10°, ... then we obtain 


10/7 = 1.42857 142857:-- 
10° /7 = 14.2857 142857--- 


10°/7 = 142857.142857--- = 142857 +1/7 


Thus 10°, like 10° = 1, leaves remainder 1 on division by 7, and it is the 
first among the positive powers of 10 with this property (because 10'/7 has 
a different decimal part for = 1,2,3,4,5). This is precisely what it means 
for 10 to be a primitive root for 7. 

A generalization of this argument gives the following. 


Criterion for maximal period length. The decimal expansion of 1/n is 
periodic of length n — 1 precisely when 10 is a primitive root for n. Also, 
n- 1 is the maximum possible period length, occurring only when n is 
prime. 


Proof. Suppose that 1/n has a periodic decimal expansion with period 
length n — 1, 
1/n = 0.4,d,°°+a,_ 1, 4,a,°°-G, a 


If we multiply this equation by 10, 107, 10°, ... then we get 


10/n = d).d,4,°°+d, 1, ,Ay°°*a, °° 


10° /n = a,dy.d3°++d,_,Q;dy* A, 


ann-l _ _ 
10 [N= AyAy+++A,_1-AyAy++ +, [00° = 4 4Q° °° Gy + 1/n. 


Thus 10"~! is the first among the powers 10, 10°, 10°, ...that leaves re- 
mainder 1 on division by n (because 10'/n has different decimal part when 
i<n—1). That is, 10 has order n— 1 and hence is a primitive root for n. 

Conversely, if 10 has order n — 1 in (Z/nZ)* then n is prime. This 
follows from the proof of Euler’s theorem, which shows that the order of 
any element of (Z/nZ)* is at most p(n). It is clear from the definition 
of the phi function that g(n) <n —1, and that equality holds only if n is 
prime. 

It remains to show that the decimal of 1/n is periodic of length n — 1 
when 10 has order n— 1. This follows by considering 1/n, 10/n, 10*/n,..., 
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10”~' /n again. The assumption that 10 has order n — 1 implies that 10”! 
leaves remainder | on division by n, so we have 


10°"! /n=a,a---a,_, +1/n, (*) 


iH — 


where a,d,---d,_, 1s the decimal numeral consisting of the first n — | digits 
of 1/n. Dividing both sides by 107~', it follows that 


1/n=0.a,a,---G, )Gyay°+-G, 40°, CF) 


though it is not clear what appears after the first 2(n — 1) digits on the 
right. Repeatedly substituting (**) back in (*) and dividing by 10”~! we 
This sequence defines the period length n—1 of 1/n, because if there were 
a period of shorter length k we could conclude as above that 10 has order 
k <n—1, contrary to assumption. O 


In the exercises below you are asked to find a prime p > 7 for which 
10 is a primitive root for p, and hence 1/p has period length p—1. In 
1801 Gauss conjectured that the maximum period length p — 1 occurs for 
infinitely many primes p but it is still not known whether this is true. In 
fact, it is not known whether any specific number, say 2 or 3, 1s a primitive 
root for infinitely many primes p. However, it is known that each prime p 
has a primitive root. We give a proof of this theorem in the next section. 


Exercises 


The decimal expansion of 1/n need not be periodic when n is not prime, for ex- 

riodic because it is periodic beyond a certain digit (in this case, beyond the first 

digit). 

3.8.1 Compute the decimal expansions of 1/12 and 1/14 by hand and verify that 
they are ultimately periodic. 

3.8.2. Explain in general why 1 /n has an ultimately periodic decimal expansion. 


The relationship between decimal expansions and powers of 10 allows us to 
use properties of the decimal expansion of | /n to predict properties of powers of 
10, mod n, and vice versa. 


3.8.3. Show, without using the decimal for 1/13, that 10 has order 6 in (Z/13Z)”*. 


3.8.4 Which is the first prime p > 7 for which 10 is a primitive root for p? Verify 
that, in this case, the decimal for 1/p has period length p — 1. 
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3.9 *Existence of primitive roots 


The existence of a primitive root for each prime p is a subtle theorem, 
because we do not know any uniform way to specify a primitive root as a 
function of p. The least primitive root, for example, seems to vary with 
p in a highly irregular way. All known proofs of the theorem get around 
this difficulty by showing only the existence of a primitive root without 
attempting to find it. 

The proofs use Lagrange’s polynomial congruence theorem from Sec- 
tion 3.5, that the number of solutions of an nth degree congruence is <n. 
The theorem is used to show that, when n < p— 1, the congruences x” = 1 
(mod p), have too few solutions to include all the p — | incongruent num- 
bers 1, 2, 3, ..., p—1. Thus at least one of these numbers satisfies only 
xP—! = | (mod p), and hence is a primitive root. 

Together with this theorem, we use the proof of Fermat’s little theorem 
from Section 3.4, which shows that each a € 0 (mod p) satisfies a con- 
gruence x" = | (mod p), where n divides p — 1. This yields the following 
proposition on the number of solutions of x” = 1 (mod p). 


Solutions of x” = 1 (mod p). The congruence x" = 1 (mod p) has at most 
@(n) solutions that are not solutions of a congruence x" = 1 (mod p) of 
lower degree. 


Proof. If a satisfies x” = 1 (mod p) but no congruence x” = 1 (mod p) of 
lower degree, then a is of order n. Then 1,a,a*,...,a”~! are distinct solu- 
tions of x’ = | (mod p) and hence, by Lagrange’s polynomial congruence 
theorem, they are the only solutions of x” = | (mod p). 

Moreover, a power a’ such that gcd(i,n) > 1 satisfies the lower-degree 
congruence xt/sed(in) = 1 (mod p). Thus the number of solutions of x” = 1 
(mod p) that do not satisfy lower-degree congruences x” = | (mod p) is at 
most the number of i that are relatively prime to n, that is, p(n). C 


Finally, to prove the existence of primitive roots, we use this propo- 
sition to show that there are not enough elements of orders n < p— 1 to 
account for the p— 1 elements 1,2,...,p— 1. To shorten notation we use 
a\b for “a divides b”’. 


Existence of primitive roots. Less than p—1 of the elements 1,2,...,p—1 
have orders n < p—1, hence one of them is a primitive root. 


Proof. By the previous proposition, the total number of elements with 
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orders n < p-— 1 is no more than 


»y p(n). 


nip—l 
nx~p— | 


We can prove that this number is less than p— 1 by proving that 


> (2) = pi. 


n|p—1 


In fact, it is true for any natural number WN that 
»y p(n) =N. 
n|N 


To see why, consider the N fractions te =. ee f. Each of these has a 
reduced form a where gcd(n’,n) = 1, obtained by dividing the top and 
bottom by their gcd. For each divisor n of N there are p(n) reduced forms 


a, and distinct fractions 4 and “ have distinct reduced forms. Therefore 
N= >) @(n), 


n\N 


as required. 0 


Exercise 
Here is another way to prove the existence of primitive roots, again assuming 
Lagrange’s polynomial congruence theorem. 


3.9.1 Suppose that the nonzero elements mod p have maximum order n < p—1. 
Show that this implies x” = | (mod p) for all the p— 1 nonzero values of x, 
mod p, contrary to Lagrange’s polynomial congruence theorem. 


3.10 Discussion 

The congruence concept was introduced by Gauss (1801), who was the 
first to recognize its value in simplifying arguments involving division with 
remainders, such as Fermat’s little theorem and Wilson’s theorem. For 
example, instead of having to say “p divides a?~' with remainder 1”, one 
can write a?~! = 1 (mod p), which looks and behaves like an equation. 
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Indeed, the concept of congruence class, introduced by Dedekind (1857), 
allows the congruence 
a=b_ (modn) 


to be replaced by an actual equation 
nit+a=n“Z+b, 


between objects, nZ+a={nk+a:keZ} andnZ+b={nk+b:k EZ}, 
that obey the rules of arithmetic. This was an important step toward modern 
algebraic thinking, though ahead of its time, because few mathematicians 
accepted the use of sets as mathematical objects until the 20th century. 

Fermat’s little theorem grew out of the special case 2?~' = 1 (mod p), 
discovered by Fermat in an investigation of perfect numbers and primes 
of the form 2? — 1. He actually stated the theorem in the equivalent form 
2? = 2 (mod p), and proved it using properties of the binomial coefficients. 
Fermat used neither the modern binomial theorem 


+byP=aP+(" )aP bt (© \arp? +... , pee re," 

a+ pb\P P ‘ P Ip 5 P 2p - bP 1 bP 
p- 

nor the formula 


k} 
but a similar proof is easily obtained from them. One simply notes that 


(P) _ eV) (@-kY 


e For k ~ 1, p the integer (7) has the prime factor p in its numerator 
but not in its denominator. Hence p divides (7). 


e Therefore, by the binomial theorem, 


P P P 
OP —({T+1)\P = 1? wae 1? 
UF) +()+()+ (2 ))+ 


=2 (mod p) 
since p divides each of (”), (2),..., ( 1). 
The equivalent form, a? = a (mod p), of Fermat’s little theorem may then 
be obtained by induction on a, since 
3? = (2+1)? 
=2?+1? (mod p) since (7), (4),..., (Py) =0 (mod p) 
=2+1 (mod p) since 2? =2 (mod p) 


=3 (mod p), and so on. 
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Around 1750 Euler gave a proof of Fermat’s little theorem that fore- 
shadows the proof of Lagrange’s theorem (20 years before Lagrange’s own 
proof, which itself was not expressed in terms of groups; the group concept 
was introduced around 1830 by Galois). 

Given a 4 0 (mod p), let {1,a,a*,...,a"~'} be the set of distinct pow- 
ers of a (which we recognize as a group A). Euler then shows that the 


we recognize as the cosets of A) form a partition of the set {1,2,...,p—1}. 
Hence the order n of a, which is the size of each coset, divides p — 1. Eu- 
ler used a similar argument to prove his generalization of Fermat’s little 
theorem. More on the early history of Fermat’s little theorem and Euler’s 
theorem may be found in Weil (1984). 

Primes of the form x* + ny* are an important thread in the history of 
number theory, and we return to them several times in this book. The case 
n= | originates with Diophantus (if not earlier, in the study of Pythagorean 
triples) and his remark on products of sums of squares that we discussed 
in Section 3.7. By 1640 Fermat had completely mastered this case by re- 
ducing it to the question of which primes are of the form x* + y’, showing 
that they are precisely the primes of the form 4n+ 1 (together with the 
obvious exceptional prime 2). We do not know how he proved it, except 
that he used descent, which was also the method of the first known proof, 
by Euler (1755). By 1654 Fermat had similarly dealt with primes of the 
form x* + 2y* and x* + 3y*. As we saw in Section 3.7, it is easy to show 
that certain congruence classes are not of the required form. More power- 
ful methods are required to show that other congruence classes are of the 
required form. We pick up this story again in Chapter 6. 

The partial success of congruence arguments with the forms x7 + y’, 
x* + 2y*, and x* + 3y* is not simply good luck. It can be explained by 
a sweeping general principle discovered by Hasse (1923) and called the 
Hasse-Minkowski principle. The principle implies that the impossibility of 
certain values for quadratic forms ax* + bxy + cy* can always be verified 
by congruence arguments. 


PREVIEW 


The commonest application of number theory, and perhaps the most 
ubiquitous application of any kind of advanced mathematics, is the 
RSA cryptosystem. In this chapter we describe the system and how 
it works, based on a few key ideas from previous chapters. 


The only theoretical ideas required are those of inverses mod n, the 
Euler @ function, and the related Euler theorem qe) =] (mod 7). 
Alhed with this are two fundamental algorithms: the algorithm for 
computing binary numerals, and the Euclidean algorithm (in the ver- 
sion that gives the inverse of a, mod D). 

Thanks to the binary numeral algorithm, exponentiation mod n is 
feasible for large exponents. A “message” (viewed as an integer ™m) 
is encrypted as m* mod n for certain publicly known e and n; and 
decrypted by raising the result to a power d, inverse to e mod @(n). 
This makes decryption easy only for someone who knows @(n). 


4.1 ‘Trapdoor functions 


The science of cryptography seeks methods for encoding or encrypting 
messages, and corresponding methods for decoding or decrypting. Typi- 
cally, encryption uses a certain key number (which may have many digits) 
and the same number is used for decryption. Without the key, it is not pos- 
sible to read encrypted messages, so the security of the system depends on 
the difficulty of finding the key. Two well-known methods of encryption 
(at opposite ends of the security spectrum) are the following. 
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Example 1. The Caesar cipher. 


This method of encryption (thought to have been used by Julius Caesar) 
simply adds the same integer key number (mod 26) to each letter in the 
message (viewed as a number between | and 26, assuming the Roman 
alphabet is used). 

For example, if the key number is 3 then the message 


Go to Zagreb tomorrow 
is encrypted as 
Jr wr Cdjuhe wrpruurz 


and the latter is decrypted by subtracting 3 (mod 26) from each letter. 

The Caesar cipher has low security because there are only 26 possible 
keys. It does not take an opponent very long to find the correct one— 
simply by trying the keys 1, 2, 3, ... until one of them produces an intelli- 
cible message. 


Example 2. The one-time pad. 


In this method the key is a long, random sequence x ,x,+,... of numbers 
x,, each between | and 26. The digit x, is added (mod 26) to the ith letter of 
the message to produce the encrypted message, and the receiver similarly 
subtracts x; (mod 26) to recover the message. Once a segment x, x5%3...Xy 
of the key has been used for a message it is “torn off the pad’, that is, the 
next portion x, ,),,5.-- 1S used for the next message. 

The one-time pad is completely secure (short of actually capturing a 
copy of the key) because all sequences x,x,x,... are equally likely, and 
hence so are all messages. There is no point even trying to guess the key. 
However, the key needs to be extremely long, since each segment of it is 
used only once, and this is inconvenient in practice. 


The dream of cryptography has always been ease of implementation 
(as in the Caesar cipher) combined with security (as in the one-time pad), 
or at least a compromise between the two: it should be feasible to encrypt 
the message, but not feasible (without a reasonably short key) to decrypt it. 
Throughout history, this dream has failed time and time again, but it was 
revived in the 1970s in the mathematically more precise form of trapdoor 
functions. 

A trapdoor function is an operation that is easy to do but hard to undo, 
like falling through a trapdoor or scrambling eggs. But unlike these real 
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life examples, a trapdoor function is supposed to be easy to undo with 
the help of a “key”. Such functions seem to exist in mathematics, and 
the theory of polynomial time computability has been developed to discuss 
them. Here we illustrate these concepts with the example most important 
for cryptography. 

If we take two large prime numbers, say 


p, = 4575163 


and 
Py = 4093567, 


then we can easily find their product 
P| Po = 1872873627642) 


(even using the school method of multiplication, which takes around n7 
steps for a pair of n-digit numbers). 

Yet if we give someone the number 18728736276421 and ask them to 
find the factors, it will probably take around a million steps. This is because 
no known method for finding a divisor of a 2n-digit number is substantially 
quicker than trying to divide it by all 10” numbers of < n digits. 

Thus the function f(p,, p>) = p, p> of numbers p,, p, can be computed 
in “quadratic time” but the inverse process of factorization seems to require 
“exponential time”. (These concepts can be made completely precise by 
formalizing the concept of computation, but an informal understanding of 
computing will suffice for our purposes.) 

The seemingly hard-to-reverse property of multiplication is the basis 
of the most commonly used cryptographic method today, the RSA system. 
The system is named after the initials of the three mathematicians who first 
published the system in 1978: Rivest, Shamir, and Adleman. It consists of 


e an encryption function E(m) of messages m that involves the product 
(p, —1)(p.—1), where p, and p, are two large primes, 


e a decryption function D(m) that involves the two primes p, and p, 
separately. 


The encryption function is easily computed from the message and the 
“key” k = (p,; — 1)(p, — 1) but the decryption function is not: it seems to 
require factorization of the key to extract the primes p, and p,. Because of 
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the apparent difficulty of factorization the key k can be made public, mak- 
ing E(m) easy for everybody to compute, while D(m) is easy to compute 
only for those who know p, and p,. 

Thus E'(m) is apparently a trapdoor function. We have to say “appar- 
ently’, because no one has yet proved the underlying claim that factoriza- 
tion is hard. In view of the enormous number of communications that use 
RSA—military, commercial and private—this is an extremely important 
question. Regardless of what its answer turns out to be, the influence of 
RSA on number theory alone is enough to justify a short chapter on the 
subject. 


4.2 Ingredients of RSA 
A. user of RSA owns a couple of large prime numbers, p, and p,. If p, and 
P, are of, say, 100 digits, then the product p, p, can be computed in around 
100 steps by the ordinary school method of multiplication. The product 
P,P> then has a unique factorization into two smaller factors, namely p, 
and p,, but no known method of finding them is substantially better than 
dividing the 200-digit number p,p, by most of the approximately 19100 
numbers less than its square root. 

Thus the user can safely reveal the product n = p, p, without revealing 
its factors p, and pp. 

The theoretical ingredients of the RSA cryptosystem are inverses mod 
k and Euler’s theorem, which we already have. The only other result we 
need is 


O(P;P>) = (p, — 1)(p, —1) for P > P> prime. (*) 


To prove (**) we ask how many natural numbers a < p,p, there are with 
gcd(a, p,; p>) = 1. The only a for which this is not the case are the p, — 1 
multiples of p, and the p, — | multiples of p,. These (p, +p.) — 2 numbers 
are distinct because p,p, 1s the smallest natural number that is a multiple 
of both p, and p,. Hence 


O(P\P2) = P\P2-1—(P, +P) +2 
= P\Py— Pi ~ Ppt 1 
= (p,—1)(p, - 1). UO 
Knowing the primes p, and p,, the user of RSA can easily compute 
n= pp, and p(n) = (py —1)(p,— 1). 
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The user also chooses an encryption exponent e, which can be any 
number with 


scd(e, e(n)) = 1, 


for example, a prime < @(n). The numbers e and n are made public, so 
anyone may use them to send encrypted messages to the user. 

The value of (1), known only to the user, enables computation of the 
decryption exponent d, which is the inverse of e, mod @(n). As we know, 
the inverse is easily computed from e and @(n) by the Euclidean algorithm. 

The mathematical core of the RSA system is the following proposition, 
proved in Section 4.4. If d is the inverse of e, mod @(k), then (m°)“ =m 
(mod n). Here mis the message, encryption raises m to the power e, mod n, 
and decryption recovers m by raising the encrypted message to the power 
d, mod n. 

Encryption and decryption are feasible because exponentiation mod n 
is easy to compute. We explain why in the next section. The key to the 
success of RSA is the presumed difficulty of factorization, which makes 
~(n) and d hard to compute for anyone who does not know the two primes 
p, and ps. 


Exercises 


To become familiar with the RSA system, take the (unrealistically small) primes 
Pp, =7and p, = 11. 


4.2.1 Explain why e = 5 is not a valid encryption exponent. 


4.2.2 Show that e = 13 is a valid encryption exponent and compute the corre- 
sponding decryption exponent d using the Euclidean algorithm. 


4.2.3 Show that e = 61 is also a valid encryption exponent, but unsatisfactory 


Such accidents, where raising to the power e does not change the message, 
are rare with the large primes p, and p, used in practice. Still, it shows that there 
are some subtleties in the proper choice of encryption exponent. 


4.3 Exponentiation mod n 


The obvious method to compute m* is to form m x m x «++ x m (k factors), 
which involves k — 1 multiplications. Since RSA uses exponents & with 


4.3. Exponentiation mod n 71 


around 100 digits, the number of multiplications in this method of expo- 
nentiation will be around 10!°, a hopelessly large number. Thus the first 
step towards efficient exponentiation is to drastically reduce the number of 
multiplications; hopefully to a number around the size of logk, which is 
proportional to the number of digits in k. We saw how to do this in Section 
1.5, using the binary numeral for k. 


Example. Construction of m”!. 


We compute in turn 


m=1xm 
me? =m 
me? = (mm)? xm 
m'! = (mm)? xm 
m2 = (m!!)? 
mS = (m2)? xm 
m! = (mS)? xm 


The total number of multiplications is the number of squarings (one less 
than the number of binary digits in k) plus the number of multiplications 
by m (no more than the number of binary digits in k). Hence the total 
number of multiplications to compute m* is no more than twice the number 
of binary digits in k, and the number of binary digits is at most log, k + 1. 


It is still not a good idea to compute m* for a 100-digit number k, even 
though it takes only about 200 multiplications, because the numbers being 
multiplied will become astronomical in length. 


What makes RSA feasible is that we do not need m* but only its remain- 
der on division by n. Because of this we can compute with remainders 
throughout, using the arithmetic of congruences. In particular, we need 
never multiply numbers larger than n, and this is what makes exponentia- 
tion mod vn feasible. Even by the school method of multiplication (which is 
not the most efficient known), multiplication of two n-digit numbers takes 
around n* steps, hence for n around 100 the work required for a couple of 
hundred multiplications is easily handled by a computer. 
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Exercises 


4.3.1 Check that the above example allows m’! to be computed using 10 multi- 
plications (not counting | x mm). 


4.3.2 Compute the binary numeral for 89, and hence show that m*? can be com- 
puted using 9 multiplications. 


a 


RSA encryption and decryption 


If the user’s primes are p, and p,, a message is written (using some simple 
translation of letters into numerals) as a natural number m less than the 
publicly known product n = p, p,. If the actual message is larger than this, 
itis broken into sufficiently small chunks that are encrypted one by one. 

As foreshadowed in Section 4.2, the encrypted message sent to the user 
is the remainder of m° when divided by n, which we abbreviate as 


mo modn 


This is a natural notation for remainders and it will not lead to confusion 
because 


r=mmodn => r=m (modn). 


The numbers e and n are made public after having been computed by the 
user from the primes p, and p,: n= p,p, and e is relatively prime to n. Itis 
feasible to compute m° mod n, even though e and n may have hundreds of 
digits, by the repeated squaring method explained in the previous section. 
The user receives the encrypted message m° mod n and raises it to the 
power d, mod n, where d is the inverse of e, mod n. The result is the 
original message m, because d is an inverse of e, mod @(n), and therefore 


ed=1+kq(n)_ forsome k. 


Hence 


= m-(mP)k 
=m(1)* (mod n) 


since m?”) = 1 (mod n), by Euler’s theorem, 


m (mod n). 
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As with encryption, it is computationally feasible to raise a number 
to the power d, mod n, provided d is known. The decryption exponent 
d can be feasibly computed by the user, who knows the factors p,, p, of 
n. These enable the computation of @(n) = (p, — 1)(p, — 1), and then the 
computation of d as the inverse of e, mod @(n), by the Euclidean algorithm. 

This computation of the inverse is feasible because the Euclidean al- 
gorithm is similar in speed to exponentiation mod n on numbers of similar 
size, as remarked in Section 3.4, and @(n) is indeed just a little smaller 
than n when n = p, Ppp. 


Exercises 


Continuing the toy example of RSA with p, =7, p, = 11 and encryption exponent 
e= 13: 

4.4.1 Show that the message m is encrypted as ((m*-m)?)?-m mod 77. 

4.4.2 When m= 7 verify that the encrypted message is 35. 


It is not guaranteed, however, that every message is disguised by the encryption 
process. This is obviously not the case for m = 1 and it can also happen for other 
values: 


4.4.3 When m = 12 verify that the encrypted message is also 12. 


4.4.4 Using the decryption exponent from Exercise 4.2.2, verify that decryption 
of 12 recovers the message 12. 


4.4.5 Explain the results of Exercises 4.4.2 and 4.4.3 by showing that 12° = 1 
(mod 77). 


4.5 


Digital signatures 


Another use of RSA is to transmit a digital signature—a proof that the user 
is who he or she claims to be. For this purpose the user can demonstrate 
possession of knowledge that no one else could have, such as the personal 
decryption exponent d that goes with the public numbers e and n. 

This can be demonstrated, without revealing d, by taking some well 
known message m and sending m” mod n. This is a scrambled message 
that only the possessor of d can create. But all the world knows e and n, 
hence they can unscramble m@ mod n by raising it to the power e, mod n: 


d\¢ — mm’? =m (modn), as above. 


(m 
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Since only m“ mod n can unscramble to the recognizable message m in this 
fashion, the world can rest assured that the sender is indeed the possessor 
of the secret number d. 


4.6 


Other computational issues 


The security of RSA depends, in the first instance, on having a large supply 
of 100-digit primes. If only a handful of such primes were available, an 
opponent could break the system by trying all pairs of them as p,, p, until 
a product p,p, equal to n is found. Fortunately this is not a problem: there 
are many large primes and it is computationally easy to find them. 

Thus an opponent’s real problem is to compute the decryption exponent 
d from the publicly known e and n. 

Since d is inverse to e, mod @(n), and O(n) = (p, — 1)(p, — 1), this 
would be feasible if the factors p, and p, of n were known. In fact it 
has been shown to be feasible only if the factors of n are known, hence 
decryption will remain difficult as long as factorization remains difficult. 

However, it is not known whether factorization is truly difficult. No 
feasible method of factorization is known but it has not been proved that 
no such method exists. A proof that there is no feasible method would 
answer the so-called “P 4 NP question”, for which a prize of $1,000,000 
has been offered. 

Roughly speaking, problems of type P (for “polynomial time”) can be 
solved by short computations, like the problem of multiplication. Problems 
of type NP (for “nondeterministic polynomial time’) have solutions that 
are verifiable by short computations, but which may take a long time to 
find in the first place. As we have seen, factorization is like this. P # NP 
says there are problems that are hard to solve but whose solutions are easy 
to verify. No such problem has yet been proved to exist though many good 
candidates are known (for example, the factorization problem). 


4.7 


Discussion 


In the mid-70s, when mathematicians became aware of problems with so- 
lutions that were apparently hard to find but easy to verify, it was proposed 
to use such problems in public key cryptosystems—systems where it was 
easy to encrypt a message but hard to decrypt without extra, secret, infor- 
mation. 
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The idea of trapdoor functions, and their application to public key cryp- 
tosystems, was first published by Diffie and Hellman (1976). They also 
proposed exponentiation mod n as a computationally feasible process that 
might be hard to reverse. The implementation of this idea in RSA was first 
published by Rivest et al. (1978) and it has since become the most com- 
monly used public key system. Just recently it was revealed that the same 
system was also discovered a few years earlier, by Clifford Cocks in the 
UK. Because it was part of his work for British Intelligence, it was kept 
secret (though why this was any use after 1978 is hard to understand). For 
more on the history of public key cryptosystems see Yan (2000). 

The basic premise of RSA, that factorization is hard, was shaken by 
a remarkable discovery of Shor (1994). Shor found that factorization can 
be done in polynomial time on a guantum computer. The catch is that 
quantum computers do not yet exist and perhaps never will. Nevertheless 
Shor’s result throws a strange new light on the concept of computation. 

In all existing computers the difficulty in factorization (and in many 
other VP problems) is that the space of possible answers is exponentially 
large relative to the question. For an n-digit number K there are around 
10"/2 numbers less than \/K , and to factorize K we cannot do much better 
than try all of them as potential divisors. Since one has to try many things 
one after the other, factorization by all known methods takes exponential 
time. 

According to quantum theory, however, in the world of the atom many 
things actually happen at the same time in the same place. The hypotheti- 
cal quantum computer harnesses this possibility to do many computations 
simultaneously, and in this way it can factorize numbers in polynomial 
time. We say “hypothetical” advisedly, since it is not known whether a 
stable computer can actually be built from atom-sized components. 


PREVIEW 


The so-called Pell equation x* — ny” = 1 (wrongly attributed to Pell 
by Euler) is one of the oldest equations in mathematics and it is 
fundamental to the study of quadratic Diophantine equations. The 


that its natural number solutions throw light on the nature of \/2. 
There is a similar connection between the natural number solutions 
of x* —ny* = 1 and \/n when n is any nonsquare natural number. 


The irrationality of ,/n when n is nonsquare causes strange behavior 


/n reflects light back on the equation: it leads to simple algebraic 
structure, and a simple general formula for all integer solutions of 
x* —ny* = 1 in terms of the smallest natural number solution. 


But there is no simple formula for the smallest natural number solu- 
tion and it is not trivial even to prove that it exists. In this chapter we 
give two proofs: the first is a relatively direct proof due to Dirichlet, 
based on the approximation of ./n by rational numbers. The second 
(in the starred sections at the end of the chapter) is based on a more 
general theory of quadratic forms due to Conway. 


We include Conway’s theory because it is a natural extension of 
our study of the Euclidean algorithm (particularly the results in the 
starred sections of Chapter 2) and because it gives a very simple 
explanation of periodicity phenomena connected with the Pell equa- 
tion and ,/n. It also gives a highly visual approach to the subject, 
which makes the complex behavior of the Pell equation surprisingly 
easy to grasp. 
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5.1 Side and diagonal numbers 


The ancient Greeks met the equation x” — 2y” = 1 in their efforts to under- 
stand \/2, the diagonal of the unit square, which they knew to be irrational. 
They found a way to produce arbitrarily large solutions (x,,y,), (X55). --- 
of this equation, and hence fractions x,/y, that approximate V2 arbitrarily 
closely. The fractions x,/y, tend to V2, because if x? — 2y? = | then 


x? 1 

—=24+—5-2 assy, 
y2 y2 

ai ai 


Thus if y, is the side of a square, x, approximates the diagonal. 
The Greeks discovered the solutions (x,,y,) among the “side numbers” 
s, and “diagonal numbers” d; defined by 


d, — 3, Sy — 2, 
di.) =d;+28;, 8:4, =, +5;. 


It follows from these equations that 


di a 2s] =1, din ~ 2574 = ~(dj _ 287). 
Hence the odd-numbered pairs (d,,5,), (d3,53), (ds,55), ... Satisfy the 
equation x* — 2y* = 1 while the rest satisfy x —2y* = —1. 

The first equation is an example of a Pell equation, the general form of 
which is x* — ny” = 1 where n is a nonsquare integer. The second is closely 
related to it; in fact we later look at all values of x7 — ny* in order to see 
whether they include the value 1. 


Irrational square roots 


In dealing with equations x7 — ny = 1, where n is a nonsquare integer, we 
rely heavily on the irrationality of ,/n proved in Section 2.5. 

The upside of irrationality is that we can encode a pair of integers (a,b) 
by a single real number a+ b,/n; we say that this number has rational part 
a and irrational part b. Real and imaginary parts are meaningful because 
if /n is irrational, a,,b,,a,,b, € Z, and 


a, +b, n= a, + bin, 


then a, =, and b, = bo. 
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Suppose, on the contrary, that b, # b,. Then 
a, — dy, = (by — by) V/n, 


and, since b, —b, #0, we get \/n= a . This contradicts the irrationality 
of \/n. Hence b, = b,, and therefore a, = a,. O 


Exercises 


In the sections that follow we use numbers of the form x, +y,./n to encode solution 
pairs of x* —ny* = 1. To give a taste of how this works, the following two exercises 
use numbers of the form a+ b/2 to encode (diagonal,side) pairs. 


5.1.1 Check that (1+ /2)* =3+2,2 and that 
(x+yV2)(1+ V2) =x+2y+(x+y)Vv2. 


5.1.2 Use induction to show from Exercise 5.1.1 that (1+ /2)"*! =d,+s,V2. 
When n is an integer square, the equation x* — ny” = 1 is not so interesting, 
so we dispose of it right now. 
5.1.3 By factorizing the left-hand side of x* ~ y* = 1, show that it has only two 
integer solutions. 


5.1.4 Show similarly that x* —ny* = 1 has only two integer solutions when n is a 
square positive integer. 


5.2 The equation x7 — 2y" = 1 


It is straightforward to find all rational solutions of x7 — ny” = 1 by Dio- 
phantus’ method (draw the line of slope t through the rational point (1,0)). 
Thus the method of solution is completely independent of n. 

It is a different matter to find even one integer solution of x* —ny* = 1 
other than the obvious ones (+1,0). The least positive solution 4 (+1,0) 
depends on 7 in a mysterious way. However, once this least nontrivial solu- 
tion is found, all other integer solutions are generated by a simple formula. 
We illustrate the method for the case n = 2. 

When x? — 2y* = 1 the smallest integer solution 4 (+1,0) can be found 
by trial to be (3,2). Other solutions can then be found by the following 
composition rule: if (x,,y,) and (x,y) are solutions of x” —2y? = 1, then 
SO iS (X3,¥3), Where x, and y, are defined by 


(x; +y v2) (X5 +y,V2) = X31 V3 V2. 
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To show that this rule gives a new solution we first calculate x, and 
y3. Expanding the left-hand side, and collecting its rational and irrational 
parts, we find that 


Xz = XyXQ + 2Y Vo, V3 = XV2 FY Xd. 
It can then be checked by multiplication that 
(Xx + 2yy Vy)? — Wo FM XQ)” = (XT — 27) (09 — 2y3) = 1x = 1 
Hence x3 — 2y5 = |, as required. C 
Examples. Composing the solution (3,2) with itself, we get a new solution 
(x3,3), where 
X3+y3V2 = (34+2V2)? =94+8+4 12V2 = 17+ 12V2. 
Equating rational and irrational parts, x, = 17, y, = 12, which is indeed 
another solution. If we then compose (17,12) with (3,2) we get 
(17+ 12\/2)(3 +2V2) = 51+48+4+ (364 34)V2 = 99+ 70V2, 


hence another solution is (99,70), and so on. By this process we can ob- 
tain infinitely many integer solutions, but it is not clear how close we are 
to finding all integer solutions. The situation becomes clearer when we 
observe that a group structure is present. 


Exercises 
Another way to arrive at the composition rule is to use the irrational factorization 
x” —2y? = (x—yv2)(x+yv2). (*) 
We suppose that | = +7 —2y7 and | = x5 —2y3, so that 
b= 1x 1 = (xj —2y7) (a3 —2y9). (*) 


5.2.1 Apply the factorization (*) to each factor on the right-hand side of (**), 
then combine the factors in a different way to show that 


L = [xyx5 + 2y Vo — (X19 +y1Xq) V2) 
x IX) X> + 29,5 + (X1Y5 +9, Xy)V2]. 


In Section 5.4 we generalize this method to find a composition rule for solu- 
tions of x* —ny* = 1. 
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5.3. The group of solutions 


Not only do solutions (x,,y,) and (x,,y,) of x” —2y* = 1 have a “product” 
(x,X5 + 2y,Vo,X,)¥o + 1x), corresponding to the product of numbers 


(x, +, V2) (a +ynV2), 


the numbers x + y,/n such that x* — ny” = 1 include 1 = 1 + 0,/n and the 
multiplicative inverse x — y\/2 of the number x + yV/2: 


(x+yV2)(x—yV2) =x —2y° = 1 


since x” — 2y* = 1 by the assumption that (x, y) is a solution. 

Thus the solutions (x,y) form a group, with the same structure as the 
set of numbers x-+ y\/2, where x, y are integers such that x7 — 2y* = 1. To 
understand this group we first focus on the subgroup of positive numbers 
x+yv/2 where x7 —2y* = 1. 


Structure of positive solutions. The group of positive x + yV2, where 
(x,y) is an integer solution of x? —2y* = |, is the infinite cyclic group of 
powers of 3 +2v/2. 


To see why, apply the log function to all the positive numbers x + yV/2 
where x, y are integers such that x” —2y* = 1. Since log(ab) = loga+logb, 
the resulting numbers log(x + yV/2) then form a group under +. 

This group has a least positive element, log(3 + 2/2), because 


e 3422 is the least x+ y\/2 corresponding to solutions (x,y) with 
x,y > 0, 


e solutions (x, —y) with y > 0 are inverses of solutions (x, y) with x, y > 
0. Hence the corresponding x— y/2 are < 1, and their logs are < 0. 


But any such group of numbers consists of the integer multiples of its 
least positive element m: if any element k lies between multiples of m, 


mn<k<m(n+1), 
we also have k — mn in the group, and the size of this element, 
0O<k—mn< |mi, 


contradicts the minimality of m. 0 
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Thus all solutions (x,y) of x7 —2y* = 1 for which x+ yV/2 > 0 corre- 
spond to powers of 3 + 2/2. Now for any solution (x,y) either x + yV2 or 
—x—y/2is > 0. Hence the remaining solutions (x, y) are just the negatives 
of those obtained from the powers of 3 + 2/2. 


Exercises 
Suppose we define integer pairs (u,,v,) by the equation 
uw, tv,V2 = (3+2V2)* for all integers k. 


Then what we have just proved is that the pairs (u,,v,) are all the integer solutions 
(x,y) of x* — 2y* = 1 with x positive. It is now quite easy to express u, and v, as 
explicit functions of k, though (not surprisingly) these functions involve V2. 


1 } } 1 | | 
n= 5 (3-+2V2)F + (3-2v2)*), v= = |(3+2V2)K- (3 -2v2)4 
2 2/2 
5.3.3 Deduce from Exercise 5.3.2 that uw, = nearest integer to (3+2/2)*/2. And 
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If n is a nonsquare integer we define 


Z[/n| = {x+yVn: x,y € Z}. 


Just as we used the numbers x+ yV2 to study x7 — 2y? = 1 we use the 
numbers x + y./n to study x7 — ny? = 1. 

In fact, x7 — ny? is what we call the norm of x+y,/n in Z[,/n], the 
product of x+ y./n by its conjugate x — y./n: 


norm(x+y/n) = (x—yVn)(x+yV/n) =x — ny”. 


Thus finding solutions of the Pell equation is the same as finding elements 
of Z|,/n] with norm 1. 

The advantage of searching in Z[,/n], rather than among pairs (x, y) of 
integers, is that we can use algebra on numbers in Z|,/n]. 
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Brahmagupta composition rule. /f (x,,y,) and (x,y) are both solutions 
of the Pell equation x* —ny* = 1, then so is 


(35.93) = (Xp +MY, Vy, X]V_ + Y{Xy). 


This generalizes the “composition” rule used for n = 2 in Section 5.2 and 
it may be proved as follows, using factorization in Z[,/n]. 
Since (x,,y,) and (x5, y,) are solutions, 


xy — ny, = 1 = 25 —1y5. 


Therefore 


= (x7 — nyt) (x5 — ny3) 

= (x, —y, Vn) (x mm n) X (Xp YyV 2) (Xy + Y2V/7) 

(4, — yy V1) (X_ — YyVN) X (2X $Y, V2) + Y2VN) 

= [XpX_ + NY — (Xp + Y{Xy) Jn) X [Xp X_ + NYY. + (XY + Yj Xq) Vn] 
= (xx) +019)” — ny y_ + Y%y)° 

= 3 — nyy O 


This “composition” of solutions to form a new solution was discovered 
by the Indian mathematician Brahmagupta around 600 CE (but without 


using \/7). 

We also have an identity solution (1,0) and an inverse (x, —y) of each 
solution (x,y), hence the solutions form a group, as we saw previously in 
the special case n = 2. As in that case, we can prove that all solutions come 
from powers of the smallest positive solution. 


Example. Solutions of x* —3y* = 1. 


We find by trial that the smallest positive solution is (2, 1). Composing 
(2,1) with itself we get the solutions 


(2x24+3x1x1,2x1+1x2)=(7,4), 
(2x7+3x1x4,2«4+1x7) = (26,15), 


and so on. These solutions correspond to the powers of 2+ 3. 


The calculation used to prove the Brahmagupta composition rule ac- 
tually shows a more general property, which holds not only with integer 
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coefficients x, y but also with rational coefficients, that is, quotients of in- 
tegers. We use the symbol Q (‘quotients’) for the rational numbers and 
make the natural generalization of Z[,/n] to 


Qin] = txt yvn: xy € Qh. 
This set of numbers is the set of quotients of elements of Z|n]| and it is a 
number field, that is, closed under +, —, x, and + (by nonzero members). 
The closure properties are easily checked by calculation (exercises). 
We extend the definition of norm to Q|,/n| by the same formula 
norm(x + y/n) = x* — ny”. 
This formula remains meaningful because each element of Q[,/n] is uniquely 


expressible as x + y,/n with x, y € Q, by the argument of Section 5.1. 
Multiplicative property of the norm. For any ao and B in Q|,/nl 


norm(o)norm(B) = norm(aB). 


Proof. Let o = x,+y,./n and B =x, +y,,/n. Then 


norm(e)norm(B) = (x} —ny7)(x3 —ny3) 


= (xyXq + My Vy)” — A(t V9 + YXy)” 
by the calculation above 
= norm(a). O 


Exercises 


5.4.1 Show that +, —, and x of numbers in Qjn] are themselves numbers in Q/n]. 
5.4.2 Show that 1/(x+ y./n) for x,y € Q (not both zero) is of the form x’ + y'./n 
for x’,y’ € Q. Deduce that Q|n] is closed under + by nonzero members. 


The multiplicative property of the norm can be restated as follows. 


Brahmagupta used this fact to solve x* —ny* = | via easier equations x* — ny” = k. 

His method is most convenient when there is an obvious solution of x* —ny* = —1. 

5.4.4 Find a nontrivial solution of x* — 17y* = —1 by inspection, and use it to find 
a nontrivial solution of x7 — 17y* = 1. 
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5.5 The pigeonhole argument 


The smallest nontrivial solution of x* — ny? = 1 is not always so easy to 
find as for n = 2 and n = 3. For example, the smallest nontrivial solution 
of x° —61y* = 1is 


(x,y) = (1766319049, 226153980)! 


This amazing example was discovered by Bhaskara Ulin 12th century India 
and rediscovered by Fermat. 

The smallest nontrivial solution appears so unpredictably that its ex- 
istence is not clear in general. However, Lagrange proved in 1768 that if 
n is any nonsquare positive integer, the Pell equation x* —ny* = 1 has an 
integer solution # (+1,0). 

An interesting new proof of this was given by Dirichlet around 1840. 
He used what is now called the “pigeonhole principle”: if more than k 
pigeons go into k boxes then at least one box contains at least two pigeons 
(finite version); if infinitely many pigeons go into k boxes, then at least one 
box contains infinitely many pigeons (infinite version). 

Dirichlet’s argument can be subdivided into the following steps. First, 
a theorem on the approximation of irrational numbers: 


Dirichlet’s approximation theorem. For any irrational ,/n and integer 
B > 0 there are integers a, b withO <b < Band 


1 
la—byn| <= 


Proof. For any integer B > 0 consider the B — 1 numbers \/n, 2,/n ..., 
(B —1),/n. For each multiplier k choose the integer A, such that 


0<A,—k/n< 1. 


Since \/n is irrational, the B — 1 numbers A, — k,/n are strictly between 0 
and | and they are all different for the same reason (by the result of Section 
5.1). Thus we have B+ 1 different numbers 


0, A,-Vn, A,—-2/n, ..., Ap.,—-(B-lVn, 1 


in the interval from 0 to 1. 
If we then divide this interval into B subintervals of length 1/B, it fol- 
lows by the finite pigeonhole principle that at least one subinterval contains 
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two of the numbers. The difference between these two numbers, which is 
of the form a — b,/n for some integers a and b, is therefore irrational and 
such that 


] 
—b —, 
la vn <a 


Also, b < B because bis the difference of two positive integers less than B. 
C] 


The next few steps are short and directed towards applications of the 
infinite pigeonhole principle. 


1. Since Dirichlet’s approximation theorem holds for all B > 0, we can 
make 1/B arbitrarily small, thus forcing the choice of new values 
of a and b. Thus there are infinitely many integer pairs (a,b) with 
la — by/n| < 1/B. Since 0 < b < B, we have 


1 


2. It follows from step | that 


ja+ by/n| < |a—by/n| + |2bV/n} < [3b a), 


and therefore ; 
la’ —nb*| < ; 3b\/n = 3\/n. 
Hence there are infinitely many a — b\/n € Z|,/n| with norm of size 
<3./n. 
3. By the infinite pigeonhole principle we obtain in turn 
e infinitely many a — b,/n with the same norm, N say, 


e infinitely many of these with a in the same congruence class, 
mod N, 


e infinitely many of these with b in the same congruence class, 
mod N. 


4. From step 3 we get two positive numbers, a, — b, \/n and a, — b,\/n, 
with 


e the same norm N, 
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® ad, =a, (mod N), 


e b, =b, (modN). 


The final step uses the quotient a — b,/n of the two numbers just found. 
Its norm a* — nb* is clearly 1 by the multiplicative property of norm. It 
is not so clear that a and b are integers, but this now follows from the 
congruence conditions in step 4. 


Nontrivial solution of the Pell equation. When n is a nonsquare positive 


integer, the equation x* —ny* = 1 has an integer solution (a,b) 4 (+1,0). 


Proof. Consider the quotient a — b,/n of the two numbers a, — b, \/n and 
a, — b,./n found in step 4. We have 


a,—by/n _ (a, —b Vn) (aq thovn) 
ay —by/n as, — nbs 


a—b/n= 


1,45 —nb,b b, — bya 
=n a 1 


where N = a3 —nb3 is the common norm of a, — b,\/n and a, — by\/n. 
Since the latter numbers have equal norms, their quotient a— b,/n has norm 
1 by the multiplicative property of norm (Section 5.4). 

Since a, ~ b,,/n and a, — b,,/n are unequal and positive, their quotient 
a—b,/n# +1. It remains to show that a and b are integers. This amounts 
to showing that N divides a,a, —nb,b, and a,b, — b,a,, or that 


ada, —nb,b, =a,b,—-b,a,=0 (mod N). 


The first congruence follows from the fact that as — nbs = N, which 
implies 


0 = aj —nbj =a,a, —nb,b, =a,a,—nb,b, (mod N), 


replacing a, and b, by their respective congruent values a, = a, (mod N) 
and b, = b, (mod N) found in step 4. 

The second congruence follows from a, = a, (mod N) and b, = b, 
(mod NV) by multiplying to obtain a,b, = b,a, (mod N), in other words, 
a,b, — b,a, = 0 (mod N). O 
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5.6 *Quadratic forms 


Dirichlet’s pigeonhole argument is one of the neatest ways to prove the 
existence of nontrivial solutions of the Pell equation and it contains ideas 
that can be applied in other situations. Nevertheless, it is not obviously 
relevant to other quadratic Diophantine equations, so there is reason give a 
second proof: one that draws on a general theory of quadratic forms. 

A binary quadratic form Ax* + Bxy + Cy’, where A,B,C € Z, can be 
viewed as an integer-valued function of integer pairs, or vectors (x,y). 
Many classical questions in number theory are concerned with the values 
of quadratic forms. For example, the Pell equation asks whether | is a value 
of the form x* — ny”, when n is a nonsquare natural number. To approach 
such questions we use two elementary properties of quadratic forms that 
can be confirmed by simple algebra. 


Properties of quadratic forms. /f f(x,y) = Ax? + Bxy +Cy* and v = (x,y) 
then 


1. f(kv) =k’ f(y), 
2. f (Vy) +V2) + f(¥) — V2) = 21 f (V1) + Flv) 
Proof. 1. If v= (x,y) then kv = (kx, ky). Hence 
f (kv) = Ake)? + B(kex) (ky) £C(ky)? = (Ax? + Bay +Cy?) = ef(W), 
2. If v, = (x,,y,) and v, = (x,,y,) then 
f(v,) =Axp+Bxyy,+Cyy and f(v,) =Ax5+Bx,y, +Cy5. 
Also 


f (Vy) + V9) =A(xy +22)? + Bley +24), +2) FCO +9)" 
= Axy + Axg + Bry, + Bryyy + Cy} + Cy3 
+ 2AX,X5 + Bxyy, + BX Vo + 2CY Vo, 
f(¥, = Vq) = A(xy — 4)? + B(x — 4) (9) — Yn) FCO — Ya)” 
= Axi + Ax5 + Bx,y¥, + Bxxy5 + Cy; + Cys 
— 2AX, xX, — Bxyy, — Bx, y, — 2Cy yp. 
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Hence 


f(¥, Vo) + f(v, — V5) = 2Axj + 2Bx,y, + 2Cyji + 2Ax5 + 2Bxyy, +2Cy3 
=2(f(v,) +f(¥)| O 


A simple consequence of Property | is that f(—v) = f(v), soa quadratic 
form makes no distinction between a vector v and its negative. Property | 
also says that f(Av) is a multiple of f(v); in particular f(v) is prime (or 
1) only for vectors v = (x,y) that are not integer multiples of other inte- 
ger vectors, that is, for (x,y) with relatively prime x and y. We call these 
primitive vectors. 

In Section 2.8 we found a map of all the primitive vectors with positive 
x and y. We also found that the latter vectors are generated from i = (1,0) 
and j = (0,1) by the processes (v,,v,) +> (¥, +¥>,V,) and (V,,V5) 
(v,,¥, +¥,). In the next section we see that vectors with x and y of oppo- 
site sign are similarly generated from (0,—1) and (1,0). Then Property 2 
shows that there is a simple relation between the values of f at successive 
stages in these processes. This leads to a “map” of the values of f. 


Equivalent forms 


Another view of a quadratic form f, related to the one described above, 
surveys all equivalent forms f*(x,y) = f(px+qy,rx+sy), obtained by 
replacing the row vector (x y) by 


(pet gy etsy) =(r 9) ( pt = (x y)M, 


satisfies these conditions, the pairs (px +qy,rx-+sy) run through the set Z* 
of all integer pairs when (x,y) does. Indeed, if (x’, y’) is any integer pair, 
we have 
(a! y= (x yMe(r y)=("' y)M™. 

Thus equivalent forms have the same set of values. Examples are x7 + y? 
and x” + 2xy + 2y”, the latter obtained from x* + y* when (x,y) is replaced 
by (v+y,y). 

When M and M~! both have integer entries, then det and detM~! 


are both integers. Since 
1 0 
—-1 
MM = ( 01 ). 
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it follows by taking the determinant of both sides that 
detM-detM~' =1 


(due to the multiplicative property: det(M,M,) = detM, -detM,). The only 
possible values for det M and detM~! are therefore +1. Thus the condition 
for a matrix M to define an equivalence of quadratic forms is that M have 
integer entries and that detW@ = ps—gqr= +1. Such a matrix is called 
unimodular. 

Now an arbitrary quadratic form can be expressed as a matrix product, 


, A 6/2 
Ax? + By +Cy* = (x y) ( B/2 / ( : ). (*) 


y 


So it follows from what we have just seen that any equivalent form is ob- 
tained by replacing 


(iP) 9 ald ie 


where M is unimodular. This is so because the new matrix effects the 
replacement of (x y) by (x y)M. 

Formula (*) reveals an invariant of the form Ax* + Bxy +Cy* under 
equivalence, namely the determinant AC — B?/4 of its matrix. Indeed, the 
determinant of any equivalent, 


a A B/2 1 
det M (gy C )m ; 


is equal (again by the multiplicative property of determinants) to 


A B/2 boa A B/2 
det det ( B/2 C ) act = (+1) det ( B/2. C 


= det ( BP “eo ). 


since detM = detM~! = +1 by hypothesis. Thus all equivalents of the 
form Ax? + Bxy+Cy* have the same determinant. 
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Exercises 


Although equivalent forms have the same determinant, the converse is not always 
true. It so happens that the form x* + y* is equivalent to all other forms with 
determinant 1, but x* +5y? is not equivalent to all other forms with determinant 5. 


5.6.1 Show that 13x° + léxy-+5y* has determinant 1, and that it is equivalent to 


x* +y? via the matrix M = ( , , ) 

5.6.2 Show that 2x7 + 2xy + 3y* has the same determinant as x* +5y*, but that 
it is not equivalent to x* +5y*, by showing that x* +5y* does not take the 
value 7. 

5.6.3 More generally, show that x* + 5y* takes no values = 3 or 7 (mod 20), by 
working out the possible values of x* + 5y* (mod 20). 


5.7 *The map of primitive vectors 


In Section 2.8 we described a partition of the plane (a “map’’) into regions 
labelled by (1,0), (0,1) and all the primitive vectors (a, b) of natural num- 
bers. Figure 5.1 (right half) shows this map again, rotated through 90°, 
together with a near mirror image of it Cleft half) in which the second co- 
ordinate of each pair has a negative sign. 


Figure 5.1: Two partial maps of primitive vectors 


Also in the right half of the figure we have the schematic vector sum 
rule that generates all the labels from (1,0) and (0,1), and in the left half 
the mirror image rule that obviously applies there. 
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We put these two maps side by side because we want to join them 
together, but we seem prevented from doing so by the incompatible labels, 
(0,1) and (0, —1), in the upper central region. The conflict can be resolved 
by giving each label a + sign. This yields Figure 5.2, which we call the 
(complete) map of primitive vectors, for the obvious reason that it contains 
every primitive vector. The -+ labelling fuses the two vector sum rules into 
the single vector difference/sum rule shown at the bottom of the Figure. 


“ +(1,-1) <+(1,1) } 
£6, » . +(1,0) NN L K a >) 

~ £(2,-1) C2, 1) ° > 
+(3, A ~ G 1) 


Figure 5.2: The complete map of primitive vectors 


This rule needs some clarification because of the ambiguous signs. In 
a + pair of vectors, say +(1,2), we are free to choose either (1,2) or 
—(1,2) as v,. Likewise for the pair, say +(2,3), labelling a region below 
an edge of region tv,: we can choose either (2,3) or —(2,3) to be vp. 
The vector difference/sum rule says that, for some choice of Vv, and V5, the 
region between v, and v, at the left end of their common edge is labelled 
+:(v, — v,) and the region at the right end is labelled +(v, +v,). In this 
example the regions are as in Figure 5.3. 


Figure 5.3: Regions above, below, and at the ends of an edge 
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Figure 5.3 shows how lines may be deformed to conform with the 
schematic diagram for the difference/sum rule—in particular the edge com- 
mon to regions +(1,2) and +(2,3) is not really horizontal—within bounds 
that preserve the meanings of “above”, “below”, “right end”, and “left end” 
for the edge common to the regions +(1,2) and +(2,3). Here the choice 


v, = (1,2), vy = (2,3) gives v,+v,= (3,5), Vv, -v, = —(1, 1), 


so at the right end +(3,5) = +(v, +Vv,) and at the left end +(1,1) = 
+(v, —v,), as required. 

It follows from the vector sum rules in the separate left and right maps 
in Figure 5.1 that the vector difference/sum rule holds in the complete map. 
This is proved by a finite number of simple checks, similar to the example 
above but more general. The details are left to the exercises. 

The sign ambiguity +(x,y) has no effect on the value of a quadratic 
form because 


Ax’ + Bry + Cy" = A(—x)* + B(—x)(—y) +C(-y)’. 


Hence the map of primitive vectors gives an unambiguous map of all values 
of the quadratic form f(x,y) = Ax* + Bxy+Cy” for relatively prime x and 
y, obtained by entering each value f(a,b) in the region +(a,b). Moreover, 
it is possible to see some pattern in this map, thanks to the parallel between 
the vector difference/sum rule and Property 2 of quadratic forms proved 
in the previous section. We show this in the next section, assisted by the 
invariance of the determinant AC — B*/4 under change of variables. The 
complete map also displays such changes, as we are about to see. 


The tree of integral bases 


In Section 5.6 we defined forms f, f* to be equivalent if f*(x,y) results 
from f(x,y) by replacing the vector (x,y) by a vector (px+qy,rx+sy), 
which is equivalent to it in the sense that (px + gy, rx+sy) runs through Z? 
when (x,y) does. Since 


(x,y) =x(1,0)+y(0,1) and (px+qy,rx+sy) =x(p,r)+y(q,5), 


this amounts to replacing the vectors (1,0) and (0,1) by the new vectors 
(p,r) and (g,s). We call the pair of vectors (1,0) and (0,1) an integral 
basis of Z* because any integer vector (x,y) is a linear combination of 
them with integer coefficients, namely x(1,0) + y(0, 1). 
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Equivalence says that the replacement M : (x,y) > (px+qy,rx-+ sy) is 
invertible, so the inverse matrix M~! has integer coefficients and the new 
vectors also form an integral basis. Thus the criterion for a pair of vectors 
(p,r) and (g,s) to form an integral basis is the criterion derived in Section 
5.6 for M and M~! to be integral, namely ps — gr = £1. 

Now in Section 2.7 we showed that, if (p,r) and (g,s) are labels on 
two regions with a common edge in the map of relatively prime pairs, then 


ps—rq=Hl. 


It is easily seen that this property extends to the complete map of Figure 
5.2. Thus each edge in the map of primitive vectors represents an integral 
basis of Z*, namely the pair of labels on the regions that meet along the 
edge. The + signs on the labels give four different bases, but they are 
essentially the same. Since the edges of the map form a tree, and each 
edge is associated in this way with an integral basis (up to sign), we call 
the edge complex of the map of primitive vectors the tree of integral bases. 

As the name suggests, the tree represents al/ integral bases. We do not 
need this fact. However, it is easy to prove using the vector difference/sum 
rule to implement a kind of Euclidean algorithm (see exercises). 


Exercises 


To prove that the vector difference/sum rule holds in the complete map of primitive 
vectors we check that it holds in the middle and in “general position” on the right 
and left. 


5.7.1 Verify that the difference/sum rule holds in the middle of the map (Figure 
5.4) by choosing v, = (0,1) and v, = (1,0). 


+(0, 1) 


Figure 5.4: The middle of the complete map 


5.7.2 Figure 5.5 shows one “general position” on the right side of the complete 
map. By choosing v, = u, and v, = u, +u,, verity that the difference/sum 
holds here. 
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Figure 5.5: One “general position” on the right 


5.7.3 Work out which other general positions occur on the right and on the left 
and verify that the difference/sum rule holds for each of them. 


5.7.4 The “vector sum/difference rule” shown in Figure 5.6 is also valid. Why? 


Figure 5.6: The sum/difference rule 


To prove that the tree in the complete map represents all integral bases we 
use the difference/sum and sum/difference rules to trace a path from a given basis 
{(p,r),(qg,5)} back to {(1,0), (0,1)}. Exercise 5.7.5 is an example, and Exercises 
5.7.6—-5.7.8 show why such a path can always be found. 


5.7.5 By repeatedly subtracting the “smaller” vector from the “larger”, reduce the 
pair { (35,3), (23,2)} to the pair {(1,0),(11,1)}. The latter pair is repre- 
sented in the tree (why?), hence so is the former (why’). 

5.7.6 Show that if 

(Pr )=(pt+ar+s), (¢,8') =(4,5) 


Or 


then ps—gr=+le p's —qdr =H1. 


5.7.7 By repeatedly adding or subtracting one vector from the other, show that 
any pair {(p,r),(g,s)} with pr —qgs = +1 reduces to a pair of the form 
{(p’,0), (q’,s’)}. (Hint: gcd(r,s) = 1. Why?) Deduce from Exercise 5.7.6 
that p’ =+1,g/=1. 


5.7.8 Deduce that {(p’,0),(g’,s’)} in Exercise 5.7.7 is represented by an edge in 
the tree, and hence so is {(p,r),(g,5)}. 
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5.8 *Periodicity in the map of x7 — ny" 


In the previous section we briefly mentioned how a map of any quadratic 
form f may be superimposed on the map of primitive vectors by marking 
the region +-v with the value f(v) = f(—v). We now investigate maps of 
quadratic forms in more depth and, to get an idea of what to expect, we 
first present the map of x* — 3y’ in Figure 5.7. Only the right half is shown, 
because the left half is its mirror image. The values are marked as numbers 


in circles. 


Figure 5.7: The map of x* — 3y° 


In this map there seems to be a single dividing line between positive 
and negative values of x* —3y* . Conway calls this line the river, and 
we have drawn it heavily in Figure 5.7. On either side of the river the 
values of x* — 3y” appear to increase in absolute value as one moves away 
from it (which is why one expects there to be only one river). And, rather 
unexpectedly, the values along the river seem to be periodic: in successive 
regions “above” the river the values are —3, —2, —3, —2,... and below each 
pair of successive regions with values —3, —2 there is a single region with 
value 1. Figure 5.8 confirms the pattern a bit further. 
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Figure 5.8: The river for x* — 3y* 


If this pattern continues indefinitely, then we can generate the sequence 
of positive solutions of the Pell equation x” —3y” = 1, namely (2,1), (7,4), 
(26,15),..., by applying the vector addition rule for the map of primitive 
vectors to locate the successive regions with value | (see exercises). 


The example of x7 — 3y” is a good example of what happens with any 
indefinite quadratic form, that is, one that takes both positive and negative 
values but not the value zero. With the help of the following proposition 
we can show that any indefinite quadratic form has a unique “river”, with 
periodic behavior. 

Arithmetic progression rule. /f L, U, D, R (for “left”, “up”, “down”, 
“right” ) are the values of a quadratic form f around an edge as shown in 
Figure 5.9 then 


Figure 5.9: Values in regions around an edge 


I. L, U+D, R its an arithmetic progression. 

2. If (p,r) and (q,s) respectively are the regions above and below the 
edge, then the common difference in this progression is the coefficient 
of xy in the quadratic form f (px + qy,rx+sy). 
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Proof. The difference/sum rule in the map of primitive vectors (Section 
5.7) implies that 


L= f(v¥,—V,), U = f(v,), D= f(v9), R= f(v¥, +p), 


where v, and v, are the regions above and below the middle edge. It then 
follows from Property 2 of quadratic forms (Section 5.6) that 


L+R=2(U+D), or (U+D)—L=R—(U+D), 


and this says that L, U + D, R is an arithmetic progression. 

Recall from Section 5.7 that if the basis i= (1,0), j= (0,1) of Z? is re- 
placed by the basis v, = (p,r), V> = (q,8), then the form f(x, y) is replaced 
by the equivalent form f*(x,y) = f(px+qy,rx+sy) = Ax? + Bxy+Cy’ 
say. Also, the values of f at V,, V,, V; + Vv, and v, — Vv, are the same as 
the values f* at i, j,i+-j andi—j, namely A,C,A+B+C andA—B+C 
respectively. 

Thus the common difference, (U + D) — L, of the arithmetic progres- 
sion is A+C —(A—B+C) = B, as claimed. CO 


Part 1 of the arithmetic progression rule is enough to show: 


e e XY e 
Uniqueness of the river. For any form x? — ny, where n is a nonsquare 
natural number, there is a unique edge path in the map of primitive vectors 
that separates regions of positive value from regions of negative value. 


Proof. Such a form is never zero, because x* — ny” = 0 implies n = x* /y? 


is a square; and x* — ny* certainly takes both positive and negative values. 
Consider a place on its map where a region of value L < 0 meets two 
regions with values U,D > 0 as in Figure 5.9. (If the region with value 
Lis actually on the right, it is still true that 1, U + D, R is an arithmetic 
progression.) 

Then Part 1 implies that R—- (U + D) = (U+D)—-L>U-+D, hence 
R > max(U,D). Thus moving one edge away from the border between 
positive and negative values leads to a region of greater positive value. 

More generally, if D > max(U,L) then R > D by a similar application 
of Part 1, so it follows that values of regions continually increase as we 
move further from the negative region. Similarly, values on the negative 
side continually decrease as we move further from the boundary path be- 
tween positive and negative regions. Hence there is only one edge path 
separating the positive- from negative-valued regions. 0 
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We need Part 2 of the arithmetic progression rule to prove the more 
difficult periodicity property, which guarantees the existence of nontrivial 
solutions of the Pell equation. 


Periodicity of the river. When n is a nonsquare natural number, the pat- 
. . Fs . ° 
tern of values along the sides of the river for x* —ny* is periodic. 


Proof. It will suffice to prove that regions sharing edges with the river are 
bounded in absolute value. Indeed, if that is so, the values L, U and Din 
Figure 5.9 around some edge in the river will recur; hence so will the value 
R (being determined by L, U and D according to the arithmetic progression 
rule), whose region also shares an edge with the river, and so on. 

As we saw in the proof of Part 2, the values U and D equal C and 
A, where Ax”? + Bxy + Cy” is a quadratic form f* equivalent to f(x,y) = 
x* —ny*. But we know from Section 5.6 that the determinant AC — B* /4 
is the same for all equivalents f* of f. Here C and A, being the values of 
regions on opposite sides of the river, have opposite signs. Hence 


|AC — B?/4| = |A||C| + B?/4 


Since AC — B* /4 is constant, it follows that |A| and |C|—the absolute values 
of D and U—are bounded as required. 0 


Exercises 


The “Pell quadratic forms” x” — ny* are by no means the only indefinite forms. 
Another interesting example is x* + xy — y*, which is related to the golden ratio 


lev5 and the Fibonacci sequence 1,1,2,3,5,8,13,.... 


5.8.1 Show that x? + xy —y* = (» +yh36 | (» + y 1345 | and deduce from this 
that the form x* + xy — y* is indefinite. 


5.8.2 Construct enough of the river for x7 + xy — y* to show that its period looks 
like Figure 5.10. 


Figure 5.10: The period of x7 +.xy—y” 


5.9 Discussion 99 


5.8.3, Show that the positive labels (x,,y,) alternately below and above the river 
(in the regions marked alternately | and —1) satisfy 


py )= C1), Op) + OR) = OJ): 


5.8.4 Deduce from Exercise 5.8.3 that the natural number pairs satisfying the 


equation x7 + xy —y* = 1 are (Fon 4 Foy) for n = 0,1,2,3,..., where 


Periodicity in the shape of the river leads naturally to recurrence relations 
between the vectors labelling riverside regions. The Fibonacci relation arising 
from x* + xy — y* is the simplest example of such a recurrence relation. Another 
is the relation for x* — By, whose river was constructed above. 


5.8.5 Use two successive periods in the river for x* — 3y* to show that the non- 
negative solutions (x,,y,) of x* — 3y? = | satisfy 


(X9,Yo) — (1,0), (Xi.4Viay) — A(xi,¥;) _ (x, 45,1): 


The river also shows why certain equations do not have solutions. 


5.8.6 Explain why the equation x* — 3y* = —1 has no integer solution. 


5.9 


Discussion 


The Pell equation x* — ny* = 1 is one of the oldest and most important 
quadratic Diophantine equations. Probably its only rival is the Pythagorean 
equation x” + y* = z*. The Pell equation also dates back to the time of the 
Pythagoreans (around 500 BCE), who studied the special case x* —2y* = 1 
in connection with the /2, as mentioned in Section 5.1. 

Another famous Pell equation is due to Archimedes. His “cattle prob- 
lem” leads to the Pell equation x* — 4729494," = 1, the least nontrivial so- 
lution of which has an x with 206545 digits! This solution was surely not 
known to Archimedes, though perhaps he knew that Pell equations could 
have remarkably large solutions. For an excellent discussion of the cattle 
problem, and the computational issues it raises, see Lenstra (2002). 

The Pell equation was rediscovered in India, where mathematicians 
were also fascinated by short questions with long answers. Around 600 
CE, Brahmagupta discovered the formula for composing solutions we used 
in Section 5.4. He used a generalization of it to find the minimal solution 
(1151, 120) of x* —92y? = 1 (saying that “a person solving this equation 
within a year is a mathematician’). In 1150 CE Bhaskara II extended Brah- 
magupta’s idea to a method that solves all Pell equations, illustrating it with 
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the well chosen example x* — 6ly* = 1. He found its minimal solution, 
(17663 19049,226153980), which is by far the largest minimal solution of 
any Pell equation x* — ny* = 1 withn < 61. 

In Europe nothing was known of the Indian discoveries, but the Pell 
equation resurfaced in the 17th century when Fermat independently dis- 
covered how to solve it. He did not reveal his method, but he evidently 
knew what he was doing, because he too picked x* — 61y* = 1 as a chal- 
lenge to other mathematicians. He also posed the even more formidable 
equation x* — 109y* = 1, the minimal solution of which is 


(158070671986249, 15140424455100). 


His English rivals Wallis and Brouncker rose to the challenge with a method 
that solves the Pell equation, not unlike the method of Bhaskara IT (see Weil 
(1984), p. 94). In the 18th century these methods morphed into the simpler 
and more elegant continued fraction algorithm, which can be viewed as the 
Euclidean algorithm applied to the pair (,/n, 1). 

All of these methods are based on the observation of periodicity in cer- 
tain computations. It is likely that the ancient Greeks observed periodicity 
in the Euclidean algorithm, because simple geometric arguments show its 
periodicity on pairs such as (2,1) and (3, 1) (see, for example, Stillwell 
(1998), p. 268, or Artmann (1999), p. 242). However, while many could 
use periodicity to solve instances of the Pell equation, the first to prove 
that periodicity always occurs was Lagrange (1768). He thereby showed 
that the continued fraction method always works. He underlined the im- 
portance of this result by showing that solving the Pell equation leads to 
the solution of a// quadratic Diophantine equations in two variables. 

Conway’s visual approach, expounded in Sections 5.6—5.8, is certainly 
related to the old approaches to the Pell equation. But it is essentially 
simpler in that it replaces a process (the Euclidean algorithm) by a picture 
(the map of primitive vectors). I have attempted to make this as clear as 
possible by deriving the map of primitive vectors and its properties directly 
from properties of the Euclidean algorithm, before imprinting the values 
of a quadratic form on it. (Conway assumes the simplest properties of the 
map, or sketches topological proofs, and proves others with the help of 
quadratic forms.) For further insights obtainable from Conway’s approach, 
see the book Conway (1997) or his related video ax* + hxy + by* available 
from the American Mathematical Society. 


PREVIEW 


The Gaussian integers Z|i] are the simplest generalization of the or- 
dinary integers Z and they behave in much the same way. In par- 
ticular, Zi] enjoys unique prime factorization, and this allows us to 
reason about Zi] the same way we do about Z. We do this because 
Z\i| is the natural place to study certain properties of Z. In particu- 
lar, it is the best place to examine sums of two squares, because in 


In the present chapter we use this idea to prove a famous theorem 
of Fermat: if p > 2 is prime then p = a* +b’, for some natural 
numbers a and b, tf and only if p = 4n+ 1 for some natural number 
n. The Fermat two square theorem turns out to be related, not only 
to unique prime factorization in Z|i], but also to the actual “primes” 
of Z|i), the so-called Gaussian primes. 


The Gaussian primes are easily shown to include the ordinary primes 
that are not sums of two squares, and the factors a — bi and a+ bi of 
each ordinary prime of the form a* + b*. Unique prime factorization 
in Zi] establishes that these are the only Gaussian primes, up to 
multiples by +1 and -t7. 


An easy congruence argument shows that ordinary primes of the 
form 4n-+ 3 are not sums of two squares. The two square theorem 
then shows that the primes that are sums of two squares are 2 and 
all the remaining odd primes, namely, those of the form 4n + 1. 

The proof of the two square theorem involves an important lemma 
proved with the help of Wilson’s theorem: each prime p = 4n+ 1 
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divides a number of the form m* +1. Since m’ + 1 factorizes in Z[i], 
it follows from unique prime factorization that p does also. The 
factorization of p turns out to be of the form (a — bi)(a+ bi), hence 
p = (a—bi)(a+ bi) = a’ +b’, as claimed. 


6.1 Z/i| and its norm 


In the last chapter we saw that certain questions about Z are clarified by 
working with generalized integers, in particular, working in Z|,/n] to solve 
x* —ny* = 1in Z. The role of Z[,/n] in this case is to allow the factorization 


x° —ny” = (x—y/n)(x+yVn). 
Similarly, when studying x* + y’, it helps to use the Gaussian integers 
Zi) ={a+bi:a,be Z} 


because x” + y* = (x—yi)(x+yi). 

Sums of two squares, x7 + y*, are the oldest known topic in number 
theory. We have already seen results about them found by the Babylonians, 
Euclid, and Diophantus. In fact, it could be said that some properties of Z/i| 
itself go back this far; at least, as far as Diophantus. 


Diophantus apparently knew the two square identity (Section 1.8) 
(aj + bt) (a +.b3) = (ajay — by by)” + (aby + bya)” 


because he knew that the product of sums of two squares is itself the sum 
of two squares. Today we recognize this formula as equivalent to the mul- 
tiplicative property of absolute value, 


24 | |Z = ZZ], 


where z, =a, + b,i and z, =a, +6,i. And Diophantus’ identity is exactly 
the formula 


norm(a, + b,i)norm(a, + b,i) = norm((a, +b,i)(a,+5,i)), (*) 
where “norm” denotes the norm of Z/i], 


norm(a+ bi) = |a+bil* =a? +b’. 
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Exercises 


When discussing factorization there are always trivial factors, called units, that we 
prefer to ignore. For example, in N the only unit is 1, in Z the units are 1 and —1, 
and in Z|i) the units are the elements of norm 1. 


Likewise, the units of Z/,/n] are its elements of norm 1, that is, the numbers 
a+ b,/n with a* —nb* = 1. 


6.1.2 Describe the units of Z[V/2]. 


6.1.3 Show that Z/,/n] has infinitely many units for any nonsquare natural num- 
ber n. 


6.2 


Divisibility and prin 


ies in Z|i] and Z 


The Z|i] norm 
norm(a+ bi) = ja+ bil? = a* +b’ 


is more useful in number theory than the absolute value because the norm 
is always an ordinary integer. The multiplicative property of the norm (*) 
implies that, if a Gaussian integer @ divides a Gaussian integer y, that is, if 


y=ap  forsome B € Zil, 


then 
norm(y) = norm(a)norm(), 


that is, norm(o) divides norm(7). 

Because of this, questions about divisibility in Zi) often reduce to 
questions about divisibility in Z. In particular, it is natural to define a 
Gaussian prime to be a Gaussian integer that is not the product of Gaus- 
sian integers of smaller norm. Then we can answer various questions about 
Gaussian primes by looking at norms. 


Examples. 


1. 4+ 71s Gaussian prime. 
Because norm(4+ i) = 16+ 1 = 17, which is a prime in Z. Hence 
4-+-11s not the product of Gaussian integers of smaller norm, because 
no such norms divide 17. 
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2. 21s not a Gaussian prime. 
Because 2 = (1 —i)(1 +7) and both 1 —i and 1+/ have norm 2, 
which is smaller than norm(2) = 4. 


3. 1—i, 1 +7 are Gaussian prime factors of 2. 
Because norm(1 —7) = norm(1 +7) = 2 is a prime in Z, hence 1 —i 
and 1+ / are not products of Gaussian integers of smaller norm. 


Prime factorization in Z|i]. Any Gaussian integer factorizes into Gaus- 
sian primes. The proof is similar to the proof in Z. 


Proof. Consider any Gaussian integer y. If y itself is a Gaussian prime, 
then we are done. If not, then y= a@B for some a, B € Z/i] with smaller 
norm. If a, B are not both Gaussian primes, we factorize into Gaussian 
integers of still smaller norm, and so on. This process must terminate since 
norms, being natural numbers, cannot decrease forever. Hence we eventu- 
ally get a Gaussian prime factorization of y. 0 


As in Z, it is not immediately clear that the prime factorization is 
unique. However, we see in Section 6.4 that unique prime factorization 
holds in Z|i] for much the same reasons as in Z. 


Exercises 


An equivalent way to define Gaussian primes, in line with a common way of 
defining ordinary primes, is to say that @ is a Gaussian prime if @ is divisible 
only by units and units times @. (It is conventional to use the Greek letter pi to 
denote primes in Zi] and other generalizations of Z, the way p is used to denote 
ordinary primes. However, to avoid confusion with 7 = 3.14159... I prefer to use 
@, the variant form of p1.) 


6.2.1 Explain why this definition is equivalent to the one above. 
6.2.2 Prove that 3 is a Gaussian prime by considering the divisors of norm(3). 


Ordinary primes are not always Gaussian primes, as we have already seen in 
the case of 2. In fact, 2 is “almost a square” in Zii). 


6.2.3 Show that a unit times 2 is a square in Zi]. 


6.2.4 Factorize 17 and 53 in Zi). 
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6.3 Conjugates 


The conjugate of z= a+ biis z= a-— bi. The basic properties of conjuga- 
tion (not only in Zi] but for all complex numbers z) are 


These can be checked by writing z; = a, + bj1, % = a, + b,i and working 
out both sides of each identity. We use these properties of conjugation to 
take the first step towards a classification of Gaussian primes. 


Real Gaussian primes. An ordinary prime p © N is a Gaussian prime <> 
p is not the sum of two squares. (And obviously p < 0 is a Gaussian prime 
<> —p © N is a Gaussian prime.) 
Proof. (<=) Suppose that we have an ordinary prime p that is not a Gaus- 
sian prime, so it factorizes in Z|i]: 


p=(a+biy, 


where a+ bi and y are Gaussian integers with norm < the norm p” of p 
(and hence also of norm > 1). Taking conjugates of both sides we get 


p= (a _ bi), 


since p is real and hence p = p. Multiplying these two expressions for p 
gives 


p= (a—bi)(a+bi) yy 
= (a +b*)|y/’, 


where both a* + b”,|y|* > 1. But the only such factorization of p” is pp, 
hence p = a* +b’. 

(=) Conversely, if an ordinary prime p equals a* + b* with a,b € Z 
then p is not a Gaussian prime because it has the Gaussian prime factor- 
ization 

p = (a— bi)(a+ bi) 


into factors of norm a” + b* = p < norm(p) = p’. O 
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Notice also that the factors a — bi and a+ bi of p are Gaussian primes 
because their norm is the prime number a* + b* = p. Moreover, all Gaus- 
sian primes a+ bi, where a, b = 0, come in conjugate pairs like this. This is 
so because if one member of the pair factorizes into aB then its conjugate 
factorizes into @B. 

What is not yet clear is whether a// Gaussian primes a+ bi with a,b 
nonzero are factors of ordinary primes p = a* + b*. It is conceivable that 
a-+ bi could be a Gaussian prime while a* + b* is a product of two or more 
ordinary primes. In Section 6.4 we rule this out with the help of unique 
prime factorization in Zii]. 

At any rate, we can see that further clarification of the nature of Gaus- 
sian primes depends on finding another way to describe the ordinary primes 
that are sums of two squares. We saw in Section 3.7 (Example 1) that or- 
dinary primes that are not sums of two squares are of the form 4n+ 3. The 
complement to this result—that any prime of the form 4n + 1 ts a sum of 
two squares—is a famous theorem discovered by Fermat. It is proved in 
Section 6.5. 


Exercises 
6.3.1 Verify the basic properties of conjugation mentioned above. 


The proof of the classification of real Gaussian primes has the following in- 
teresting Consequences. 


6.3.2 Show that each ordinary prime has a distinct Gaussian prime associated 
with it. 


6.3.3 Deduce that there are infinitely many Gaussian primes. 


Since the real positive Gaussian primes are those of the form 4n + 3, another 
way to prove that there are infinitely many Gaussian primes is to show that there 
are infinitely many ordinary primes of the form 4n+ 3. The proof is along lines 
similar to Euclid’s proof in Section 1.1. 


6.3.4 Show that the product of numbers of the form 4n + 1 is of the same form. 
Deduce that any number of the form 4n + 3 has a prime divisor of the form 
An-+3. 


6.3.5 If p,,p,,.--,p, are primes of the form 4n + 3, show that 2p, p,---p, +1 is 
also of the form 4n + 3. 


6.3.6 Deduce from Exercises 6.3.4 and 6.3.5 that there are infinitely many primes 
of the form 4n + 3. 
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6.4 


Division in Z/i| 


Unique prime factorization in Z/i], as in Z, relies on the Euclidean algo- 
rithm, which depends in turn on: 


Division property of Z|i|. [f a, B 4 0 are in Zii| then there is a quotient U 
and a remainder p such that 


a=uB+p with |p) <|p. 


Proof. This property becomes obvious once one sees that the Gaussian 
integer multiples uB of any Gaussian integer B + 0 form a square grid in 
the complex plane. 

This is because multiplication of B by i rotates the vector from 0 to B 
through 90°, hence 0, B, and iB are three corners of a square. All other 
multiples of B are sums (or differences) of B and iB, hence they lie at the 
corners of a square grid. (Figure 6.1.) 


Figure 6.1: Multiples of a Gaussian integer 


Any Gaussian integer a lies in one of these squares, and there is a 
nearest corner uB (not necessarily unique, but no matter). Then 


a=upBp+p, where ([p| = distance to nearest comer, 


so |p| is less than the side of a square, namely |B). O 
Thanks to the division property we have 


1. A Euclidean algorithm for Z/i] 


2. gcd(a,B) = ua+vB for some u,v € Zi). 
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3. The prime divisor property: if a prime @ divides aB then @ divides 
ao or @ divides PB. 


4. Unique prime factorization up to order and factors of norm |, namely 
+1 and +7. Elements of norm | are called units and unique prime 
factorization usually comes with the qualification “up to unit fac- 
tors”. This 1s true even in Z, where the units are +1 and hence 
primes may vary up to sign. 


As a first application of unique prime factorization in Z|i] we complete 
the description of Gaussian primes begun in Section 6.3. There we found 
that the real Gaussian primes are ordinary primes that are not sums of two 
squares, and their negatives. It is also clear that the pure imaginary Gaus- 
sian primes are of the form --ip, where p is a real Gaussian prime. Thus it 
remains to describe the Gaussian primes a+ bi with a, b nonzero. 


Imaginary Gaussian primes. The Gaussian primes a+ bi with a, b nonzero 
as . ° . 9 
are factors of ordinary primes p of the form a* + b’. 


Proof. First, as noted in Section 6.3, if a+ bi is a Gaussian prime then so 
is a— bi (because if a — bi = @B is not prime, neither is a+ bi = OB). 
Next, (a — bi)(a + bi) is a (necessarily unique) Gaussian prime factor- 
ization of 
p=a’ +b" = (a—bi)(a+bi). 
But p must then be an ordinary prime. Indeed, if 
p=rs with l<ns<p and rs€Z, 


then the Gaussian prime factors of r and s give a Gaussian prime factoriza- 
tion of p different from (a — bi)(a + bi) (either two real factors r and s, or 
> four complex factors). Li 


Exercises 


Using unique prime factorization we can prove results on squares and cubes in 
Z\i|, similar to those on squares and cubes in N proved in Section 2.5. The only 
difference is that we have to take account of units, as indeed we already do in Z. 


6.4.1 Is it true in Z that relatively prime factors of a square are themselves squares? 
If not, how should the statement be modified to make it correct? 


6.4.2 Show that relatively prime factors of a cube in Z are themselves cubes. 


6.4.3 Formulate a theorem about relatively prime factors of a square in Z/i]. 


6.4.4 Show that relatively prime factors of a cube in Z/i| are themselves cubes. 


6.5 Fermat’s two square theorem 109 


6.5 Fermat’s two square theorem 


In Section 3.7 we used congruence mod 4 to show that primes of the form 
4n-+ 3 are not sums of two squares. Fermat’s two square theorem says that 
the remaining odd primes—those of the form 4n + 1—are all sums of two 
squares. 

We apply the theory of Z/i] to a prime p = 4n+ 1 with the help of an 
m € Z such that p divides m* + 1. Such an m always exists by a result of 
Lagrange (1773) that follows from Wilson’s theorem in Section 3.5: for 


1x2x3x---x(p-1)=-—1 (mod p). 
Lagrange’s lemma. A prime p = 4n+1 divides m* +1 for some m € Z. 
Proof. If we apply Wilson’s theorem to the prime p = 4n+ 1 we get 


—-l=1x2x3x---x4n (mod p) 


| 
(1x2 ---x 2n)x 

((2n+1)x-++-x (4n—1) x (4n)) (mod p) 
(1x 2x---x2n)x 
( 
( 
( 


(—2n) x ---x (—2)(—1)) (mod p) since p —k = —k (mod p) 
1x2x-+++x2n)?(—1)*” (mod p) 


1x2x-++-x2n)* (mod p) 


Taking m = (2n)! we get m* = —1 (mod p). That is, p divides m*+1. O 


Fermat’s two square theorem. /f p = 4n+1 is prime, then p = a* +b? 
for some a,b € Z. 


Proof. Given p, let m € Z be such that p divides m* + 1, as in the lemma. 
In Zi], m? +1 has the factorization 


nm? +1 = (m—i)(m+i). 


And, even though p divides m* + 1, p does not divide m— i or m+i because 
i ; and > + + are not Gaussian integers. 

By the Gaussian prime divisor property of Section 6.4, it follows that 
p is not a Gaussian prime. But then p = a’ +b’, as proved in Section 6.3. 
CL] 
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It also follows that 
p= (a—bi)(a+ bi) 


is a factorization into Gaussian primes, and we now know that any such 
factorization is unique. So in fact we have a stronger form of Fermat’s two 
square theorem: each prime p = 4n+1 is a sum a* + b* of two squares for 
a unique pair of natural numbers a, b. 


Exercises 


Here is another way in which Zi] throws light on sums of two squares. The 
following exercises develop a proof of a theorem of Euler (1747): if gcd(a,b) = 1 
then any divisor of a* + b* is of the form c* +d* where gcd(c,d) = 1. The main 
steps depend on unique prime factorization in Z|i]. 


6.5.2 Show that each integer divisor e > 1 of a* +b* is a product of Gaussian 
prime divisors g + ir of a* +b’, unique up to unit factors. 

6.5.3 Show that each of the Gaussian primes g + ir divides either a —ib or a+ ib. 
Deduce that none of them is an ordinary prime p. 


6.5.4 Show that, along with each Gaussian prime factor g+ir of e, its conjugate 
q —ir is also a factor. 


6.5.5 Deduce from Exercise 6.5.4 that e is of the form c* +d? where c-+di divides 
a+ bi. 


6.5.6 Deduce from Exercise 6.5.5 that gcd(c,d) = 1. 


6.6 Pythagorean triples 


Now is a good time to revisit the primitive Pythagorean triples, whose re- 
lationship with Z|i] was suggested in Section 1.8. Since odd squares are 
congruent to | (mod 4) and even squares are congruent to 0 (mod 4), a sum 
of two odd squares is not a square. Hence in a primitive triple (x,y,z) one 
of x, y is even and zis odd. The argument in Section 1.8 was that if 


Poyar 


then 

(x—yi)(x+yi) =z, 
so x — yi and x-+ yi are Gaussian prime factors of an odd square, z*. Then 
we wanted to say that: 
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1. If x and y are relatively prime (in Z) then so are x — yi and x + yi (in 


Z\i\). 
2. In Zi], relatively prime factors of a square are squares. 


The first statement is correct. If gcd(x,y) = 1 in Z then ged(x, y) = 1 in 
Z|i|. This is so since a common Gaussian prime divisor is accompanied by 
its conjugate, and their product is a common divisor >1 in Z. A common 
divisor of x — yi and x + yi also divides their sum 2x and difference 2iy. 
Therefore, since gcd(x, y) = 1, any common prime divisors of x — ty and 
x-+ ty are primes -¢1 +7 dividing 2. No such divisors are present, since they 
imply that (x —iy)(x+ iy) = 2 is even. 

The second statement is not quite correct, but the following amendment 
of itis: relatively prime factors of a square are squares, up to unit factors. 
This follows from unique prime factorization in Zi]. 

Since x — yi and x+ yi have no common Gaussian prime factor, while 
each prime factor of z* occurs to an even power, each prime factor of x— yi, 
and each prime factor of x+ yi, must also occur to an even power. A product 
of primes, each occurring to an even power, is obviously a square (compare 
with the same argument for natural numbers in Section 2.5). Hence each of 
x— yi and x-+ yiis a unit times a square, since their only possible nonprime 
factors are units. C] 


The amended second statement is good enough to give us the conclu- 
sion we expect. We have shown that x — yi is a unit times a square, hence 
it is one of 


(s—ti)*, —(s—ti)’, i(s—ti)?, —i(s—ti)*, forsome s,t € Z. 
That is, it is one of 
(s?—1?)—2sti, t°—s?+2sti, 2st+(s?—17)i, —2st+(t? —s7)i. 


In each case, equating real and imaginary parts gives one of x and y in 
the form u* — v* and the other in the form 2uv for some natural numbers 
u and v. Thus the result is essentially the same as that obtained by the 
loose argument in Section 1.8, but better, because it does not force the 
even member of the pair, 2uv, to be first. 

Moreover, we necessarily have gcd(u,v) = 1 because any common 
prime divisor of u and v is a common divisor of u* — v? and 2uv, hence 
of x and y. Thus the correct outcome of the speculation in Section 1.8 is: 
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Primitive Pythagorean triples. /f x° + y* = 2° for some relatively prime 
natural numbers x and y, then one of x and y is of the form u* — v* and the 
other of the form 2uv, for relatively prime natural numbers u and vy. 0 


We also find in each case that z = u* + v”, because 
(u> —v°)? + (Qu)? = ut + Qu? + = (ur? + y)?. 


Thus z is a sum of two squares. Since u and v are any relatively prime 
numbers, and a prime u* + v? necessarily has gcd(u,v) = 1, z can be any 
prime sum of two squares. Thus we get a geometric characterization of the 
primes that are sums of two squares. 


Prime hypotenuses. The primes that are sums of two squares are those 
that occur as hypotenuses of right-angled triangles with integer sides. 


Exercises 


The last result, together with Fermat’s two square theorem, shows that the primes 
of the form 4n-+ 1 are precisely those occurring as hypotenuses of integer right- 
angled triangles. 
6.6.1 Find integer right-angled triangles with hypotenuses 5, 13, 17 (you should 
know these), and 29, 37, and 41. 
6.6.2 Given a prime p = 4n + 1, is the integer right-angled triangle with hy- 
potenuse p unique? 
The argument above shows that, if (x,y,z) is a primitive Pythagorean triple, 
then x + yi is a unit times a square in Zi]. But once we know that x = u* 
y = 2uv we can say more. 


2 
— yr, 


6.6.3 If (x,y,z) is a primitive Pythagorean triple with x odd, show that x-+ yiis a 
square in Zi). 
6.6.4 Verify directly that 3 + 47 is a square in Z/i]. 
It should be clear from your answer to Question 6.6.3 that finding the pa- 
rameters u and v for a given primitive Pythagorean triple (x,y,z), with x odd, is 
equivalent to finding the square root(s) of a complex number. 


6.6.5 Find the square root of 5+ 121. 


6.6.6 If you have some software for computing square roots of complex numbers, 
verify that each entry (x,y,z) in Plimpton 322 (Section 1.6), except the 
triple (60,45,75), yields a y+ xi that is a square in Z|i]. (Note: this includes 
the last triple (90,56, 106), which is clearly not primitive.) 


6.6.7 Explain how to compute the square root of a complex number by hand, 
using quadratic equations. 


6.7 *Primes of the form 4n-+ 1 113 


6.7 *Primes of the form 4n-+ 1 


Lagrange’s lemma, proved in the Section 6.5, is actually half of an impor- 
tant result concerning the so-called “quadratic character of —1” that we 
study further in Chapter 9. Here we use it to prove that there are infinitely 
many primes of the form 4n+ 1, complementing the corresponding easy 
result about primes of the form 4n + 3 proved in Exercises 6.3.4—-6.3.6. 


Quadratic character of —1. The congruence x° = —1 (mod p), where p 
is an odd prime, has a solution precisely when p = 4n-+ 1. 
Proof. When p = 4n+ 1, Lagrange’s lemma gives an x with x” = —1 (mod 
p). To show that x* = —1 (mod p) has no solution when p = 4n+3 we 
suppose, on the contrary, that it does. 

If 


y= -1 (mod p = 4n+3) 
then raising both sides to the power 2n + 1 gives 
(x7)77F 1 = (—1)7"t1=—1 (mod p=4n+3). 
Since 2(2n +1) = 4n+2 = p—1, this says that 


x?! =—1 (mod p), 


contrary to Fermat’s little theorem. Hence x* = —1 (mod p) has no solution 
when p = 4n-+3. 0 
Thus solutions of x* = —1 (mod p) occur precisely when the odd prime 


pis of the form 4n + 1. To put it another way: the odd primes p that divide 
values of x° +1, for x € Z, are precisely the primes p= 4n+1. 


Infinitude of primes 4n + 1. There are infinitely many primes of the form 
p=4n+1. 


Proof. From what we have just proved, it suffices to show that infinitely 
many primes divide values of x* + 1 for x € Z. Suppose on the contrary 
that only finitely many primes p,, p,,...,p, divide values of x +1. 

Now consider the polynomial 


(P1P> - Diy) +1= g(y). 


Clearly, any prime p that divides a value of g(y), for y € Z, also divides 
a value of x* + 1 (namely, for x = P\ P+: p,y). But none of p,, p,.--,P, 
divides g(y), because each leaves remainder 1. 
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Therefore, no prime divides g(y), for any y € Z, and hence the only 
possible values of the integers g(y) are +1. In other words, 


(Dp: Po+-- py) H1=+I1 forall y€ Z. 
But this is absurd, because each of the quadratic equations 


(P\P2°- Py) +1=1 and (p,py--- py)? +1=-1 
has at most two solutions y. This contradiction shows that x* + 1 is divisible 
by infinitely many primes, as required. CO 


It now follows, by Fermat’s two square theorem, that infinitely many 
primes are sums a* + b? of two squares. Hence there are infinitely many 
Gaussian primes a+ ib that are neither real nor pure imaginary. 


Exercises 


The argument just used to prove that x* + 1 is divisible by infinitely many primes 
can be generalized to any nonconstant polynomial f(x) with integer coefficients. 
We suppose that 


F(X) = Gmx" +++++4,xX+4 9, where dp,d,,...,dm € Zand dy, dm FO, 


has values divisible only by the primes p,, p,,...,p,, and consider the polynomial 


where g(y) is a polynomial of degree m. 


6.7.1 Show that g(y) has integer coefficients, constant term 1, and that any prime 
dividing a value of g(y), for y € Z, also divides a value of f(x), for x € Z. 


6.7.2 Show, however, that none of p,, p).-.., p, divide g(y) when y € Z. 
6.7.3 Deduce from Exercise 6.7.2 that g(y) = +1 for any y € Z. 


6.7.4 Show that the equations g(y) = | and g(y) = —1 have only finitely many 
solutions, which contradicts Exercise 6.7.3. (Where have you assumed that 
f(x) is nonconstant?) 


This contradiction shows that f(x) is divisible by infinitely many primes. But 
now notice: we did not assume that infinitely many primes exist, hence this is a 
self-contained proof of Euclid’s theorem that there are infinitely many primes. 


6.7.5 Is this argument essentially different from Euclid’s? 
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Discussion 


The two square theorem was stated without proof by Fermat in 1640, 
though he claimed to have a proof by descent: assuming there is a prime 
that is of the form 4n-+ 1 but not a sum of two squares he could show that 
there is a smaller prime with the same property. The first known proof of 
the theorem was in fact by descent, and published by Euler (1755). It cost 
him several years of effort. 

Today it is possible to give quite simple proofs with the help of the re- 
sult we called Lagrange’s lemma in Section 6.5. Lagrange himself proved 
this lemma by means of Fermat’s little theorem and his own theorem on 
the number of solutions of congruences mod p (Section 3.5). 

Lagrange (1773) used his lemma together with his theory of equiva- 
lence of quadratic forms (Section 5.6) to give a new proof of the two square 
theorem. The part of the proof involving quadratic forms was simplified by 
Gauss (1801), long before his creation of the Gaussian integers. It seems 
that Gauss had the main results about Z/i], including unique prime fac- 
torization, around 1815, but they were first published in 1832. The proof 
used in this chapter, combining unique prime factorization in Z[i] with La- 
erange’s lemma, is due to Dedekind (1894). 

Yet another popular proof uses the geometry of numbers, developed in 
the 1890s by Minkowski. It may be found in Scharlau and Opolka (1985) 
together with a historical introduction to Minkowski’s results. 

Parallel to all the popular proofs of the two square theorem there are 
analogous proofs of the four square theorem of Lagrange (1770): every 
natural number is the sum of (at most) four natural number squares. Most 
use the following counterpart of Lagrange’s lemma: any prime p divides 
a number of the form /* + m* +1. The counterpart turns out to be easier. 
What is harder is the four square identity discovered by Euler (1748b). Itis 
analogous to the two square identity of Section 6.1 but is much more com- 
plicated (see Section 8.3). It can be mechanically checked by multiplying 
out both sides, but what does it mean? 

The Gaussian integer proof is favored in this book because Zi] is a 
natural structure and the two square identity is a natural part of it—the 
multiplicative property of the norm—rather than an accidental identity of 
formal expressions. In Chapter 8 we give a similar “structural” proof of 
the four square theorem that uses the guaternion integers. These are a 
remarkable four-dimensional structure from which the four square identity 
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emerges naturally as the multiplicative property of the guaternion norm. 
Again, the key to the proof is unique prime factorization (or rather the 
prime divisor property, which happens to be somewhat easier than unique 
prime factorization in the quaternion case). 

Fermat’s two square theorem was generalized in another direction by 
Fermat himself. In 1654 Fermat announced similar theorems on primes of 
the forms x* + 2y* and x* + 3y’: 


p=Hx+2y? & p=8n4+1 or p= 8n+3, 
pHxrt3y & p=3n4e1. 


Our proof of the two square theorem in Section 6.5 can be adapted to Fer- 
mat’s x° + 2y* and x* + 3y* theorems with the help of unique prime fac- 
torization theorems for numbers of the forms a+ b.\/—2 and a+ b\/—3 
respectively. Such theorems will be proved in the next chapter. 

The other thing we need to adapt is Lagrange’s lemma: if p = 4n+ 1 
then p divides a number of the form m* +1 for some m € Z. In Section 
6.7 we described this lemma (together with its converse) as the quadratic 
character of —1 because it says that —1 is congruent to a square mod p 
precisely when p = 4n+ 1. 

To prove Fermat’s theorems on primes of the form x* + 2y* and x7 +3y" 
we similarly need the quadratic characters of —2 and —3. They are: 


—2 = square (mod p) & p= 8n+1 or 8n+3, 
—3 = square (mod p) & p= 3n+1. 


Instead of finding quadratic characters one by one, in Chapter 9 we prove 
the sweeping law of quadratic reciprocity, which allows us to tell when 
any integer is congruent to a square mod p. Quadratic reciprocity was 
first observed by Euler and proved by him in special cases, such as those 
involved in Fermat’s theorems. The first general proof is due to Gauss 
(1801), and quadratic reciprocity has since been proved in many different 
ways. In fact, it has been proved more often than any other theorem except 
its distant ancestor, the Pythagorean theorem. 


PREVIEW 


Just as Gaussian integers enable the factorization of x* + y*, other 
quadratic expressions in ordinary integer variables are factorized 
with the help of quadratic integers. Examples in this chapter are 


42 =(x—V—2)(x+ V2), 


| of (x+ —| oF 


In the first example, the factors belong to 
Lf —2) = {a+b/—2:a,b eZ}. 


Like the Gaussian integers a+ bi, the numbers a+ b\/—2 enjoy 
unique prime factorization. We use this property to find all (ordi- 
nary) integer solutions of the equation y> = x7 +2. 


Th . —ltv-3 444 -l-V-3 3; 
e numbers ——— and —— in the second example appear 


at first to be “fractional”, and one might prefer to reserve the term 
“integer” for numbers in 


Zi/—3] = {at+bV—3:a,b € Z}. 


However, unique prime factorization fails in Z|./—3], and it is pre- 


—1+ 7-3 
2 


cisely by adjoining the number that it is regained. 


This leads to a discussion (partly in the exercises) of the general con- 
cept of quadratic integer and its applications. The chapter concludes 


with two remarkable applications of ./—3: parametric formulas for 


the (infinitely many) rational solutions of x+y = 7+w’, 


3 


and 
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7.1. The equation y° = x* +2 


Diophantus was usually interested in rational solutions of equations, and 
the equation y’ = x* + 2 has infinitely many of these (see Exercises 1.7.1- 
1.7.4). But in his Arithmetica, Book VI, Problem 17, Diophantus men- 
tioned that y° = x* + 2 has the solution x = 5, y = 3. He evidently thought 
that this natural number solution was interesting. In 1657 Fermat added 
the claim that there is no other natural number solution of y? = x? +2. 

Fermat’s claim was proved by Euler (1770), assuming unique prime 
factorization in 


ZV —2| = fat+bV—-2:4,b€ Z}. 


Euler gave no the proof of the latter fact (which is similar to the proof for 
Z|i| given in Section 6.4), but leaving unique factorization aside for the 
moment, his argument goes as follows. 

Supposing that y? = x° + 2 for some x,y € Z, we have a factorization 


in Z[/—2]: 
y= (x- V—-2)(x4+ V—2). (*) 


Now we assume unique prime factorization in Z|./—2] and also that 
ecd(x— V—-2,x+ V-2) = 1 


(another fact that we leave aside for the moment). By considering the prime 
factorizations of both sides of (*), it then follows that the factors x — /—2 
and x + ./—2 are cubes in Z|/—2). 
The latter statement means that 
x—/-2=(a+bV/—-2) forsomea,b€Z 
= a + 3a°b\/—2 — 6ab* — 2\/—2b° 
= a? — 6ab* + (3a*b — 2b?) /—2. 
Equating real and imaginary parts gives 
x=a° —6ab’ 
1 = 2b° — 3a*b = b(2b* — 3a’). 


The latter equation says that the natural number b divides 1, hence b= +1. 
Then the other divisor 2b* — 3a’ of 1 must equal —1, hence a = +b = +1. 
This gives x = -£5 and hence y = 3. Cl 
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Exercises 


There is a similar treatment (using Z/i)) of the equation y*? = x7 + 1, which shows 


7.1.1 Supposing that the factors x-i on the right-hand side of 


y =—4+1= (x —1)(x+2) 


7.1.2 Deduce from Question | that a = 0, hence x = 0, hence y = I. 


A more challenging equation, which can also be mastered with the help of 
Zi], is y’ =x* +4. Fermat claimed that the only natural number solutions are 


again without proving unique prime factorization. The argument goes along the 

above lines when x is odd, in which case it happens to be correct to assume that 

the factors x — 27 and x + 27 are both cubes. 

7.1.3. Assuming that x+2i are cubes (a+ bi)? in Z[i], show that 2 = b(3a? — b*) 
and deduce that the positive odd x = 11. 


7.2 The division property in Z|\/—2| 


Unique prime factorization follows the same way as in Z and Z/i|: from 
the prime divisor property that follows from a Euclidean algorithm, made 
possible by a division property. 


Division property for Z|\/—2]|. For any a, B 40 in Z|\/—2| there are pL, 
p in Z|,\/—2] with 


a=p—B+p and |p| <\Bp|. 


Proof. To see why the division property holds in Z/,\/—2] we look at the 
multiples 4B of any nonzero B € Z{\/—2]. These lie at the corners of 
a grid of rectangles, the first of which has corners at 0, B, B\/—2 and 
B(1+ V—2), shown in Figure 7.1. 

Any & € Z|\/—2] is in one of these rectangles and, as the picture shows, 
the distance |p| of a from the nearest multiple 1B of B satisfies 


2 2 
pl S (3) + (4) by Pythagoras’ theorem 
2 V2 


_ (BP +2162 _ 316? 


4 4 
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Figure 7.1: The division property in Z|,/—2] 


Hence |p| < |B), as required. 0 


The units in Z[,/—2], as in Z, are just +1. We prove this using the norm 
a* +2b* of a+b./—2. Units are elements of norm 1, and a* +2b* = 1 only 
ifb=Oanda=+1l. 
Now suppose we have a factorization of a cube into relatively prime 
factors s and t in Z[,/—2): 
y =st. 


Since s and ¢t have no prime factor in common, the cubed prime factors of 
y> must separate into cubes inside s and cubes inside t. There could also 
be unit factors in s and ¢, but these can only be | or —1, both of which are 
cubes. Hence the relatively prime factors of a cube are themselves cubes. 

This fills another gap in Euler’s solution of y> = x* +2. The only gap 
that now remains is to show that gcd(x — /—2,x+ /—2) = 1. 


Exercises 
The equation y> = x + 1 that we took up in the last exercise set calls for a similar 


study of units in Z|i). Recall from Section 6.4 that the units of Z[i) are +1, +7. 
7.2.1 Check that each of the units in Z|i] is a cube. 


7.2.2 Deduce from Exercise 7.2.1 and unique prime factorization that relatively 
prime factors of a cube in Z/i| are themselves cubes. 
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The same properties of units and cubes apply to the equation y? = x7 +4 
when we factorize its right-hand side into (x — 27)(x + 27), but when x is even, say 
x = 2X, we have another problem. 


3 


7.2.3 Show that in this case y is even, y = 2Y say, and hence y> = x* +4 is equiv- 


alent to 2Y° =X? +1 =(X —i)(X +i). 
case | —i divides X —i. 


It follows, by taking conjugates, that | +7 divides X +7 for any odd X. In fact, 
since i(1 —1) = 1 +i, the number | —i is a common divisor of X —i and X +i in 


Z,|i]. In the next exercise set we see whether | — 7 is their gcd. 


7.3. The ged in Z|./—2| 
Again we use the Z[,/—2] norm 
norm(a+ bV/—2) = |a+b/—2)* = a? +20’, 


which is multiplicative by the multiplicative property of absolute value. 
Thus it is true, as in Z|i], that if @ divides y then norm(a) divides 
norm(y). Therefore, if 6 is a common divisor of a and B, then norm(6) is 
a common divisor of norm(a) and norm(B). 
We can now return to the equation 


yp=xr+2= (x — /—2)(x + V—2) 


and compute gcd(x — /—2,x+ /—2). 


Relative primality of the factors. If x,y © Z are such that y? = x* +2, 


then gcd(x—yV/—2,x+yV—-2) = 1. 
Proof. If y — x*+2, then x must be odd. Indeed, for even x we have 
xv +2=2 (mod 4), 


whereas 
y =0,lor3 (mod 4). 


This can be seen by trying y = 0,1,2,3 (mod 4). It follows that the norm 
x° +2 of xt /—2 is odd. 
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Now the ged of x —- /—2 and x+ /—2 also divides their difference, 
2./—2, which has norm 8. The gcd of 8 and the odd number x’ + 2 is 1. 
Therefore 


ecd(x— V—-2,x+ V—-2) = 1. 0 


We have now filled the gaps in Euler’s proof that x = 5, y = 3 is the only 
natural number solution of y? = x* +2: the cube y° factorizes into the rel- 
atively prime numbers x — \/—2 and x-+ \/—2, which are cubes by unique 
prime factorization in Z|,\/—2] and the fact that the units in Z[/—2] are 
also cubes. We can therefore write x — \/—2 = (a+ b,\/—2)° and complete 
the proof as already indicated in Section 7.1. 


Exercises. 


We can similarly use Z/i) to complete the proof, begun in the previous exercise 
sets, that x = 0, y = I is the only integer solution of y = x? +1, 


7.3.1 Use congruence mod 4 to show that x is even in any integer solution of 
y? =x* +1. From now on assume that (x,y) is such a solution. 


7.3.2 Explain why gcd(x —i,x+7) = gcd(x+i,2) and use Question 7.3.1 to show 
that norm(x +7) is odd. 


7.3.3 Deduce from Question 7.3.2 that gcd(x—i,x+i) = 1. 
7.3.4 Deduce, from the previous exercises and unique prime factorization in Z/i), 
that the factors on the right-hand side of y*? = (x — i)(x +) are cubes in 


Zii). 


Likewise, we can find gcd(X —i,X +i) when X is odd, and hence complete 


7.3.5 Show that 2 does not divide X —i or X +7, and deduce from Exercise 7.2.4 
that gcd(X —i,X +i) =1—i. 


7.3.6 Use unique prime factorization in Zi] to deduce from 2¥* = (X —i)(X +3) 
that 


X—~i=(1—i(a—bi)? for some a,b € Z. 
7.3.7 Deduce from Exercise 7.3.6 that 
1=a° —b° +3ab(a—b) = (a—b)(a* + 4ab +b’) 


and conclude that X — 1, hence x = 2. 


7.4 Z|/—3] and Z[C,] 123 


7.4 Z{/—3] and Z[C,) 
A natural step after investigating Z[i] and Z/,/—2] would be to study 
ZIV —3) = {a+ bV—3:4,b€ Zh. 


But here is a surprise: unique prime factorization fails in Z|,/—3]. 
Consider the following factorizations of 4: 


4=2x2=(1-V—3)\(1+ V—3). 
In Z|\/—3] the norm is 
norm(a+ b/—3) = |a+bV—3/? = a? + 3b? 


and, as usual, if @ divides y then norm(a) divides norm(y). 
But now norm(2) = 4, which is not divisible by any smaller integer of 
the form a* + 3b? except 1, hence 2 is a prime of Z[,/—2]. And 


norm(] — /—3) =14+3=4, 


so 1 — —3 is also prime, as is 1+ ./—3. Thus 4 has two distinct prime 
factorizations in Z|./—3]. 0 


This defect can be repaired by enlarging Z[,/—3] to 
Z|,| = {a+b ,: a,b € Z}, 


where 
pol v-3 

an 2 
is one of the imaginary cube roots of L. (This is why we use the subscript 
3. In general, 6, denotes cos 24 = + isin 2 <2 the nth root of 1.) The elements 
of Z[¢,| lie at the corners of a tiling of the plane by equilateral triangles, 
and are called the Eisenstein integers. 

By geometric arguments like those used for Z[i] and Z[,/—2], we can 
see that Z|¢,| has the division property, hence a Euclidean algorithm and 
unique prime factorization. Figure 7.2 compares the point sets Z[/—3] 
and Z|¢,| in the plane, showing why the division property fails for the first 
and succeeds for the second. 

In the rectangles of Z[,/—3], each center point (such as the one at the 
top of the triangle shown) is at distance 1 from the two nearest corners, 
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Figure 7.2: Z[,/—3] (left) and Z[,] (right) 


hence its distance from the nearest corner 1s not less than the smallest side 
length of the rectangle. Z[¢,| fills these “holes” in Z|,/—3], producing the 
tiling of the plane by equilateral triangles. 


Division property for Z|¢,|. For any a,B #4 0 in Z|C,| there are UL, p in 
Z|¢,| with 


a=uB+p and |p| <\Bl. 


Proof. In the equilateral tiling, each point of the plane lies in some triangle 
and its distance from the nearest vertex is less than the side length of the 
triangle. In fact, its distance from any vertex of the surrounding triangle is 
less than the side length. This 1s so because a circle centered on a vertex 
and with a side as radius encloses the whole triangle. 


This geometric property is the essence of the division property because, 
as usual, the set uB of multiples of any nonzero B € Z/¢,| is the same shape 
as Z|¢,|. Its points are at the vertices of a tiling by equilateral triangles of 
side length |B|. Hence the distance |p| = |a — uB| from any a € Z[¢,| to 
the nearest 1B is less than the side length |p). CI 


There are six units in Z[C,]: +1, +€, and +, and they lie at vertices 
of a hexagon on the unit circle with center at O (see Figure 7.2 again). Like 
the units in Z and Z/i]|, they all divide 1. The two distinct factorizations of 
4 in Z[\/—3] are the “same” in Z[¢,], up to unit factors, thanks to the extra 
units in the latter (exercise). 
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Quadratic integers 


It is satisfying to be able to repair the failure of unique prime factorization 
in Z|[,/—3] by extending it to Z[€,]. But is it reasonable to consider €, = 
aliv=3 an “integer”? The general definition, which allows us to say yes, 
is the following. 


Definition. A number a € C is an algebraic integer if it satisfies a monic 
polynomial equation with integer coefficients, that is 


m m—| _ ar¢ 
ON +a, ;°  +++++a,4+a,=0 where dp,d,,...,a,_, © Z. 


A quadratic integer is an algebraic integer satisfying a monic quadratic 
equation with integer coefficients. 


We study the general concept of algebraic integer in Chapter 10, where 
it is shown that the sum, difference and product of algebraic integers are 
again algebraic integers. ¢, is an algebraic integer because it satisfies 
x —1=0. In fact, ¢; is a quadratic integer because it satisfies the equation 
x? +x+1 =0 obtained by factorizing x° — 1. All elements of Z[C,], being 
obtained from the algebraic integers 1 and ¢, by sum and difference, are 
algebraic integers. It can be shown directly that they are quadratic (exer- 
cise). 

Closure under +, —, and x is certainly a natural requirement for inte- 
gers, but perhaps the definition of algebraic integer goes too far and admits 
numbers that should not be considered integers. One reason that it does 
not is the following: every rational algebraic integer is an ordinary inte- 
ger. This is crucial when results about ordinary integers are being derived 
as special cases of results about algebraic integers. 


Rational algebraic integers. /f a rational number r satisfies a monic poly- 
nomial equation with integer coefficients, then r is an ordinary integer. 


Proof. Suppose that r= s/t, where s,t © Z and gcd(s,t) = 1, and suppose 
that r satisfies the equation 


x™+a,_ jx"! +++++a,x+d)=0, where dp,a,,...a,_, € Z. 


Substituting s/t for x and taking all terms but x” to the right-hand side we 
get 


S 
—A,- — ap. 
ly ) 
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Multiplying both sides by 7” gives 


mn 


_ m—I Pe 
So = —a,,_ {8 [—-+:—~a,St 


Mn 
Apt 


2 -1 7 
—aAgt”” *)t. (*) 


on 


m— 
15 "t+ Gy St 


Since gcd(s,t) = 1, any prime factor of t divides the right-hand side but not 
the left. Hence t = +1 by unique prime factorization in Z, and therefore r 
is an ordinary integer. 0 


Remark. It follows from this result that the roots of a monic polynomial 
equation with integer coefficients are either integers or trrationals. This 
generalizes what was proved in Section 5.1—that ,/n is irrational when 
n is a nonsquare natural number—because ,/n is a root of the equation 
x*—n=0. 


Exercises 


The two factorizations of 4 found in Z/,/—3], 2 x 2 and (1 — /-3)(1+ V—3), 
differ only by unit factors in Z[¢,). 
7.4.1 Find units uw and @ in Z[¢,| such that 2u x 24 = (1 - /-3)(1+ V—3). 


7.4.2 Check that the units +¢,, +67 of Z/C,] satisfy monic polynomial equations 
with integer coefficients. 


7.5  *Rational solutions of x° + y° =2+w 


Like the Pythagorean equation x* + y* = z*, the equation «°° +? = 2 +w° 
has infinitely many integer solutions, some of them of great renown. Here 
is one, associated with the great Indian number theorist Ramanujan. 


It was Litthewood who said that every positive integer was one of 
Ramanujan’s personal friends. | remember going to see him once 
when he was lying ill at Putney. I had ridden in taxi-cab number 
1729, and remarked that the number seemed to me rather a dull one, 
and that I hoped it was not an unfavorable omen. “No,” he replied, 
“itis a very interesting number; it is the smallest number expressible 
as the sum of two cubes in two different ways.” 


Hardy (1937). 
The “two different ways” referred to by Ramanujan are 


1729 =9° +10 = 1°+12°, 
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thus they correspond to an integer solution of x» + y° = 2° +w?*. According 
to Brouncker (1657), the same solution was found by Frenicle, along with 


P+15S = 2? +16", 
15° + 33° = 2° 434°, 
16° + 33° = 9° +347, 
19° +247 = 10° +27°. 
Another startling solution of this equation is x = 3, y=4,z=—5,w=6; 
in other words 3° + 4° + 5° = 6°. This result, which seems to “generalize” 
3° +. 4° = 5° really belongs to the equation x° + y’ = z° + w’, but the latter 
resembles the Pythagorean equation in one respect—there is a parametric 
formula for all its rational solutions. 
The formula is due to Euler (1756). His method can be simplified using 
complex numbers, namely the norm in Q[,/—3] = {a+ bV/—3:4,b€ Q}. 


Parametric solution of x° + y? = 2° +w°. The rational solutions are 


x = [(p+3q)(p* +3q°) — Ir, 
(—p+3q)(p° + 3q°) + Ur, 

z= [-p+3q+(p? +3¢q°)"Ir, 
[p +39 -(p* +3q°)"Jr 


y= 


w= 


where p, g and r run through all rational numbers. 


Proof. If we make the substitutions x = X¥+Y, y= X -—Y,z=2Z-+W, 
w = Z—W, then the equation x + y> = 2 +w°* becomes 


X(X*+3¥°) = Z(Z? +3W’), 


and X,Y,2Z,W are rational if and only if x,y,z, w are. 

Thus the problem is equivalent to finding the rational solutions of the 
equation X(X?+3Y*) = Z(Z* + 3W7). Also, we can specialize to Z = 1 
(if we later multiply the solution by an arbitrary rational), so it suffices to 
find the rational solutions of 


1+3W? 
yw 
X7243Y2 
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Now a’ + 3b? = |a+b\/—3\*. Hence 
y — + Wv-3P 
(IX +4¥ V3) 
1+Wy-3/° 
X+YvV-—3 
= |ptqv—3)° =p’ +34’, 


for some p, g that are rational if X, Y, W are. 
We can define p and qg as the real and imaginary parts of 


1+WyV-3 
PIN FOS: 
X+Y¥V/—-3 
Multiplying both sides by X + Y /—3 gives 
pX —3qY + (qX + pY)V-3 =14+Wv-3, 


and therefore, equating real and imaginary parts, 


since the norm is multiplicative 


pX—3qY =1, gX+pY=W. 


Since these are linear equations in Y and W, we can solve for Y and W 
rationally in terms of p, g, and X = p* +3gq7. This gives a 1-to-1 corre- 
spondence between rational pairs (p,q) and rational triples (X,Y,W) such 
that X(X7+3¥7) =1+3W?. 

Substituting these values of X, Y, Z = 1, and W back in x, y, z, w 
we find that the rational solutions of x»° + y? = z>-+w°® are all the rational 
multiples of 


x = (p+3q)(p- +3q°)—1, 
y= (—p+3q)(p> +3q°) +1, 
= —pt+3qt+(p+3q°)’, 
w = p+3q— (p> +3q°)’, 


as claimed. O 


Kxample 


p=1,q=1 gives 
15° 4+.9° = 18° + (—12)°, 


which is a multiple (and rearrangement) of the equation 3° + 4° + 5* = 6°. 
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Exercises 


7.5.1 Find a simpler p and g that also give (a multiple of) 3° + 4° +5° = 6°. 


yields a parametric integer solution. However, Davenport (1960), p. 162, gives an 
infinite class of integer solutions discovered by Mahler in 1936. 


7.5.2 By setting p = 3g and then making a linear change of variable from g to f, 
derive the infinite family of integer solutions 


w = 3t —9f*. 


7.5.3 Find values of ¢ that give 1° + 6° + 8° = 99 and 9° + 10° = 1° + 12°. 


ime \/—3 in Z\G,| 


Perhaps the most important Diophantine equation that can be analyzed with 
the help of Z/¢,] is the Fermat equation 


ety ae. (*) 


By doing so we settle the n = 3 case of Fermat’s last theorem: x" + y" #£ z” 
for natural numbers x, y, z and n > 2. 

Supposing, for the sake of contradiction, that (*) holds for some natural 
numbers x, y, z, we factorize the left-hand side: 


vty =(xty)Q—xyty’) inZ 
=(xt+y)(xt+ Gy)(x+ Gy) in Z(G). 


If x, y, zare relatively prime, one might then hope that x+y, x+ C,y, x+ Czy 
are also relatively prime. If so, we could use unique prime factorization in 
Z6,| to conclude that x+ y, x+ ¢,y, x+ Czy are units times cubes and plan 
to derive a contradiction by this route. 

Surprisingly, the very assumption that x° + y° = 2 forces a factor /—3 
into the equation. By suitable naming of terms, \/—3 divides z and each 
of x+y, x+y, and x+ ¢Zy. This ruins the original plan but suggests a 
new one: divide both sides of the equation (*) by (,/—3)° and build a new 
equation of the same form but “with fewer factors of \/—3”. By slightly 
generalizing the Fermat equation, the new plan can be made to work. It 
leads to a contradiction by infinite descent because an integer equation in 
Z|¢,| cannot be divided by the integer /—3 indefinitely. 
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To see how ./—3 insinuates itself into the Fermat equation (*), we first 
develop a few of its basic properties. These involve congruence mod \/—3, 
where, as usual, 

O=T (modv) 


means that v divides o — t. In particular, o = 0 (mod \/—3) means that 
\/—3 divides o. 

Figure 7.3 shows the congruence classes mod \/—3 in Z[¢,]. There are 
three of them: the classes of 0 (black), 1 (white), and —1 (grey). 


V3 
@ 


Figure 7.3: The congruence classes mod \/—3 


This can be checked by calculation. It suffices to look at the possible 
remainders on division by \/—3, which have absolute value < 3. We can 
now prove the following properties: 


Cubes mod 9. For any o € Z{C,], if o #0 (mod /—3) then o° = +1 
(mod 9). 


Proof. Since o 4 0, o = +1 (mod \/—3). Choose t = to with t = 1 
(mod \/—3), so T= 1+p/—3 for some yu € Z[C,]. Then 
tP—l=(1+u/-3)-1 
= 3u/—3 + 3(uv—3)* + (uv -3)° 
=3/—3(u+ p> V—3-*) 
= 3/—3(u—u°) (mod /—3) 
=—3/—3u(u—1)(u+1) (mod V—3). 
Now ut, LW — 1 and uw +1 are in different congruence classes, hence one 


of them is divisible by \/—3. Thus t* — 1 is divisible by —3.\/—3./—3 = 9. 
That is, T° = 1 (mod 9), and therefore o° = +1 (mod 9). = 
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It follows from this property that if a, B,y € Z/¢,| and 
a+ p+ =0 co) 

—an equivalent of x° + y? = 2 since —z° = (—z)*>—then \/—3 divides at 
least one of a, B, y. If not, then taking congruence mod 9 gives 

+t1+1+1=0 (mod 9). 
It easily checked that this is impossible for all eight combinations of signs. 
By suitable renaming of a, B, y, if necessary, we can assume that y is 
divisible by /—3, say y= 6(.\/—3)”. Then the three cubes equation (**) 
becomes 

on + Bp? 4 5° (\/—3)" _ 0, (*) 


and a second important property of congruence mod \/ —3 comes into play. 


Congruence of factors in a sum of two cubes. For any a, B € Z|¢,|, the 
factors 0+ B, @+6,B, a+ CFB of a° + B° are congruent mod \/—3. 


Proof. Since ¢, = — Leys 
—1++/-3 
2 


34/3 
2 


A+ 0,8 = a+ B=a+p+ p 


1+ /-3 
= O+ B a 37 B V —3 
=a+ pB (mod /—3) 
Similarly, 


1-/-3 


a+6¢B=a+— 5 B=a+B (mod /—3) O 


We now apply this property to the factorized form of equation (***): 


(a + B)(a@+ 6,B)(a + GB) + 5°(V—3)™" = 0. 
The number \/—3 is prime in Z/¢,| because its norm 3 is prime in Z. Since 
V—3 divides 5°(,/—3)°", it also divides at least one of the factors a + B, 
a+¢,8,a+¢;B. But then it divides all of them, since they are congruent 
mod /—3. Altogether we get: ifnumbers a, B,y € Z|¢,| satisfy 
+B +7 =0, 

then (with suitable renaming of numbers if necessary) \/—3 divides y and 
all the factors a+ B, a+ 6,8, a+ CFB in of? +B. 
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Exercises 


In other approaches to the equation o° + B* + y° = 0 that I know—Nagell (1951), 
p. 241, Grosswald (1966), p. 169, Redmond (1996), p. 697, and the outline in 
Baker (1984), p. 86—the number A = 1 — ¢, is used in place of /—3. This is 
probably because the equation a” + B" +" = Ocan be similarly treated using A = 


7.6.1 Show that A equals a unit times ./—3. 
7.6.2 Deduce from Exercise 7.6.1 that o = tT (mod A) S o = T (mod /—3) and 
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We now justify the hunch that the equation a? + B’ +7 = 0 is impossible 
for a, B, y € Z[C,] because it seemingly admits unlimited division by /—3. 
We suppose that y is the term divisible by /—3, so y= (\/—3)"6 for some 
5 not divisible by \/—3. Then the equation can be written 


a + BP + ( /3)°" §° =O 


for some natural number n that we suppose to be as small as possible. In 
fact we must have n > 2. This is so because a, B, and y are relatively 
prime, hence a and B, like 5, are not divisible by \/—3. But then each is = 
+1 (mod \/—3) by the enumeration of congruence classes in the previous 
section. Hence if n = 1 and we reduce the equation mod 9, the property of 
cubes mod 9 gives 


+1+1+(/—3)?=0 (mod 9) 


which is clearly impossible for any combination of signs. 
We can therefore assume that n > 2. To enable repeated division by 
/—3 we assume that a slightly more general equation holds: 


oF + BP + e(/—3)"5° =0, (*) 


where a, 8,6 € Z|¢,| are relatively prime and € is a unit of Z/¢,|. The unit 
€ is there because, as we shall see, division introduces units that cannot be 
completely eliminated. 
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Impossibility of a? + B° + e(./—3)°"5° = 0. When a, 8,6 € Z[C,] are 
relatively prime and not divisible by \/—3, n > 2, and € is a unit, 


a + BP + e(V/—3)"5? 40. 


Proof. Suppose on the contrary that there are relatively prime a, B, 0, 
not divisible by \/—3, such that equation (*) holds. It follows from unique 
prime factorization in Z[¢,] that the prime \/—3 divides a° + B°, and hence 
\/—3 divides at least one of its factors a+ B, a+ 6,8, a+ CFB. 

But all of these factors are congruent mod \/—3, as we found in the last 
section, so in fact ./—3 divides them all, and therefore 


a+B a+6,B a+&B B 
V-3° V3) V3 
These three elements have no common prime divisor in Z[¢,|. For exam- 
ple, a common divisor of ae and aoe also divides their difference, 
1—¢, V- 3 = l+V- 
J/-3 2/3 2 


Hence a common divisor of a and al Po B. One finds that it 


also divides a by similarly considering 2282 =. ~ O37 ol But there is no 


a+B a+¢,B 
common prime divisor of a and B, hence none eof 73 and at Similar 


r2 
hows th for the ott from 228 OtSP ang STEP 
algebra shows the same for the other pairs from Tay 3 , and 733 


Z\C3). 


p= 2 v= Tt Vv 6 unit x B. 


Thus we can apply unique prime factorization to the following rear- 
rangement of equation (*), 


a+ p +O a+ OB 3n—3 §3 
“3 f-3 VJ-3 =—e(v— 3S", 


to conclude that each factor on the left is a unit times a cube, say 


a+ B +B 3 at tp 
= £04), Pops, STP ey 
V—-3 V—-3 Vv—3 


with a,, B,. y, relatively prime because at “ee one are. It follows 


that the prime power (,/—3)°”~° resides in exactly one of a; ; B? ; yp. By 
renaming, if necessary, we can assume that it is in yp = (./—3)°"-76). 
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Now notice the delightful fact that 


2048 OF oB at 3B 

5 V3 3 8 OB 
In terms of a, B,, 0,, this fact is 

Ce, at; + €, Bi + C3€3(v —3)"° 8? =0, 
which, when divided by the unit ¢/e,, takes the form 
oy) + €,B) + €5(Vv—3)" 5p = 0. (**) 

Here €,, €; are units and @,, B,, 6, are relatively prime and not divisible 
by /—3. Equation (**) is of almost the same form as (*), except for the 
presence of the unit €,. Fortunately we can show that €, = <1 as follows. 

Since n > 2, (/—3)°""> is divisible by 3\/—3 whereas a7, B? = +1 
mod 9 (by the property of cubes mod 9) and hence also mod 3.\/—3. Thus 
reducing (**) mod 3.\/—3 gives 

tlt é,=0 (mod 3/—3). 


=0 because 1+6,+65 =0. 


-+- 


The only units satisfying this congruence are €, = +1, as required. 

Equation (**) is therefore of the simpler form 

or) + By + €5(V/—3)" Sp = 0 ) 

with o,,B,,5, relatively prime and not divisible by /—3. Since —B? = 
(—B,)°, (***) is indeed of the same form as (*), except that the exponent 
of / —3 is less. 

This contradicts the assumption that the exponent of \/—3 in (*) is as 
small as possible, hence (*) does not hold. 0 


> is impossible for integers x,y,z #0. 


Corollary. The equation x+y? =z 
Proof. Suppose on the contrary that x° + y° = z° for integers x,y,z 4 0. 
Dividing by any common divisor in Z/¢,| we obtain an equation 
oP +Be+y=0 with a,B,y< Z[C,] relatively prime. 
By suitably renaming the numbers, if necessary, we can assume that the 
multiple of VW —3 is y= (/—3)"6, where 6 is not divisible by /—3. We 
then have an equation 
of + BP + (\/—3)°""8? = 0, 

where a, 8,6 € Z|C,| are relatively prime and not divisible by /—3. This 
is a special case of the equation just proved to be impossible, therefore 
x+y? = 2 is impossible for integers x, y,z 4 0. Cl 


7.7 *Fermat’s last theorem for n = 3 135 


Exercises 
The impossibility of a° + B° + y° = 0 for nonzero a,B,y € Z[C,| is probably the 
most difficult result in this book, so the reader may find the ideas in the proof a 


little slippery. The following exercises aim to provide a better grip by using the 
ideas again in a similar problem. They prove a theorem of Legendre that 


o° + 8° +3y =0_ is impossible for nonzero a, B,y € Z| G3). 


As above, the algebra introduces unknown units, so we need to show impossibility 
of the more general equation 


a + BY +e(v—3)"" 7" =0 (*) 


where \/—3 does not divide y and € is unit of Z[€,]. The usual preliminary step of 
dividing by any common factors allows us to assume that a, B, y have no common 
prime divisor and are not divisible by \/—3. We also assume that the exponent of 
/—3 in (*) is as small as possible. 


7.7.1 Explain why o° + B? +37 = 0 is a specialization of (*). 
7.7.2 Reducing (*) mod 9 and using the property of cubes, show that n > 1 in (*). 


7.7.3 Now use the congruence of factors of a? + B?, their relative primality, and 
unique prime factorization in Z/[¢,|, to conclude from (*) that two of 


a+B a+¢,8 a+ €2B 
f/—-3° V3 V3 


are units times cubes and the third is a unit times a cube times (/—3)*""!. 


E Z|¢;| 


7.7.4 Deduce, from Exercise 7.7.3 and the “delightful fact” that there is a valid 
equation of the form 


£, 0 + &B) +e,(Vv—3)" y= 0, 
or equivalently 
a; + €, BF + 5 (Vv — 3) ty? = 0, (*) 
where €,, €, are units of Z[¢,| and a, B,, y, are not divisible by /—3. 


7.7.5 By reducing (**) mod 3, show that €, = +1 (where isn > | used?). Deduce 
that (**) is equivalent to an equation of the form (*), but with a smaller 


power of /—3. 


7.7.6 Conclude that equation (*) does not hold for any nonzero a, B, y € Z[G,|. 
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7.8 


Discussion 


In recent chapters we have seen many ways in which algebraic numbers 
illuminate the ordinary integers, and Diophantine equations in particular. 
At the simplest level, the multiplicative norm enables us to do such things 
as: 


e Generate solutions of the Pell equation x” — ny” = 1 from the powers 
of x, +y,./n, where (x,,y,) is the smallest natural number solution. 


e Find all rational solutions of «° +y? = 2°+w”. 


At a more sophisticated level, certain rings of algebraic integers can be 
shown to have unique prime factorization, among them Z]i|, Z|./—2]| and 
Z|¢,|. This enables us to analyze algebraic factorizations such as 


x+y" = (x—yi)(x+yi) 
e+ y = (x+y)(x+ C3y) (x+ C3 y) 


and find solutions of certain equations in which they appear. For example: 


e The primitive solutions of the Pythagorean equation x°* + y* = z* can 


be found by factorizing x* + y* in Z/i). 


® Fermat’s theorem that each prime p = 4n+ | is a sum of two squares 
can be proved by showing that p divides m* + 1, and factorizing 
m +1 in Zi). 


e The integer solutions of the Bachet equation y’ = x* +2 can be found 
by factorizing x7 + 2 in Z[/—2]. 


3 


e Nonexistence of of natural number solutions of x° + y? = z° can be 


proved by factorizing x° +y° in Z[G,]. 


But so far we have proved that unique prime factorization holds only 
in Z, Z[i|, Z[/—2], and Z[€,], and we have seen that it does not hold in 
Z\\/—3]. Therefore, there is no guarantee that we can push on with this 
approach to Z[,/—5], Z|./—6], ... or to Z[G,,| for higher values of n. 

In Chapter 11 we show that unique prime factorization fails again in 
Z\\/—5], and this time it cannot be repaired by filling obvious “holes” in 
Z{/—5], as we did in Z[\/—3]. The situation calls for some “ideal num- 
bers” from mathematical outer space—it is not clear that they exist in C 
where the usual algebraic numbers come from. 
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This dire situation was first recognized by Kummer in the 1840s, and it 
came to light when Lamé (1847) published a faulty proof of Fermat’s last 
theorem that x” + y” # z” for natural numbers x, y, z and n > 2. Lamé used 
the factorization 


x" +y" = (xty)(et Gry) (a+ GP 'y) 


where ¢,, = cos = + isin an as we did in Section 7.7 with n = 3. How- 


ever, he assumed that Z/¢,,| has unique prime factorization, and Kummer 
showed that this is false for n > 23. Kummer was no doubt aware that it 
also fails in rings of quadratic integers such as Z/[,\/—5], but he was more in- 
terested in Z|¢,,|, called the cyclotomic (“circle-cutting’”) integers because 
1,6,,67,...,¢"7| cut the unit circle into n equal parts. 

He introduced “ideal numbers” to restore unique prime factorization in 
Z\C,|, and it enabled him to prove Fermat’s last theorem for many values 
of n, though not all. More importantly, the “ideal” concept outgrew the 
cyclotomic integers and spread into algebra and algebraic geometry as well 
as number theory. The simpler examples like Z[,/—3] and Z/,\/—5] were 
pointed out by Dedekind in the 1870s, in the course of giving a down-to- 
earth explanation of “ideal numbers”. We follow Dedekind’s approach in 
Chapter 11, and an English translation of Dedekind’s own exposition may 
be found in Dedekind (1877). 

It should be mentioned that Fermat’s last theorem for n = 4 and n = 7 
may be proved using only ordinary integers. A proof for n = 4 was given 
by Fermat himself—one proof in number theory that he actually wrote 
down—and variations of it appear in many books. Two variations may be 
found in Stillwell (1998), pp. 131-134. An elementary proof for n = 7 
was discovered by V. A. Lebesgue (1840), and it was further simplified by 
Genocchi (1876), following the remarkable strategy of forming the sum 
of seventh powers of the roots of a cubic equation. This little-known 
proof may be found in Nagell (1951), pp. 248—251, and Ribenboim (1999), 
pp. 57-62. 


PREVIEW 


In this chapter we prove that every natural number is the sum of 
four integer squares, following a proof of Hurwitz. This proof has 
been chosen because it resembles the proof of Fermat’s two square 
theorem already given in Chapter 6, and because it introduces the 
quaternions, a mathematical structure with many beautiful algebraic 
and geometric features. 

We define the quaternions to be the matrices ( at di , D+ “ ) ; 

—b+ci a-di 
—b a 

behave like the complex numbers. In this representation, the norm 
is just the determinant, and its multiplicative property follows from 
the multiplicative property of determinants. On complex-number 
matrices, the determinant gives again the two square identity, and 
on quaternions it gives a four square identity. 


where a,b,c,d € JR, after verifying that the matrices ( aq? ) 


“Quaternion integers” should be the quaternions with a,b,c,d © Z. 
However, these lack the division property. To bring it in we augment 
them with “half integer points” to form the so-called Hurwitz inte- 
gers. We can then establish a Euclidean algorithm and a prime di- 
visor property. (The quaternion product is noncommutative, which 
is a Slight obstacle, but we get around it by taking care always to 
multiply and divide on the same side.) 


The proof of the four square theorem then follows the proof of the 
two square theorem very closely. 
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e Using conjugates, any ordinary prime that is not a Hurwitz 
prime is shown to be a sum of four squares. 


® If an ordinary prime p divides a Hurwitz integer product af, 
then p divides a or p divides B. 


e Any ordinary odd prime p divides a natural number of the 
form 1 + /? +m? (analogous to Lagrange’s lemma in Section 
6.5 but easier to prove). 

e The number | + /* +m’ factorizes in the Hurwitz integers. 
Hence p is not a Hurwitz prime and therefore p is a sum of 
four squares. 

@ Since every natural number 7 is a product of odd primes and 
the prime 2 (which equals 0* + 0* + 1* + 1°), the four square 
identity shows that n is a sum of four squares. 


8.1 Real matrices and C 


In this chapter we introduce 4-dimensional “hypercomplex numbers” called 

quaternions. A quaternion is easily defined as a 2 x 2 matrix of complex 

numbers, but to see why we might expect matrices to behave like numbers, 

we first show how to model the complex numbers by 2 x 2 real matrices. 
For each a+ bi € C, with real a and b, consider the matrix 


M(a+bi) = ( _ D ). 


a 
It is easy to check (exercise) that 


M(a,+b,i)+M(a,+b,i) = M(a, +a, +(b, +5b,)i) 
= M((a, +d,i) + (a) +41), 


M (a, + byi)M (ay + byi) = M(ayay — by by + (aby + by a)i) 
= M((a, + by) (ay + byt). 


Thus matrix sum and product correspond to complex sum and product, and 
therefore the matrices 


( “ | for a,bEeR 
—b a | 


behave exactly like the complex numbers a+ bi. 
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Another way to see this is to write 


a b 1 0 O 1 . 
(5 , )=a( T)ee( ) ) ali 


The identity matrix 
1 O 
(01) 


behaves like the number 1, and 


(43) 


behaves like ,/—1. Indeed 


Not only does this matrix representation of C have natural counterparts 
of 1 and i, it also has a natural interpretation of the norm on C as the 
determinant. This is so because 


norm(a+ bi) = a* + b* = det ( _ ° 


The multiplicative property of the norm follows from the multiplicative 
property of the determinant: 


w( 3 2)e(# B)-m((4 2)C8 2) 
—b, a, —b, a, —b, a, —b, a, 

(*) 
And since the matrix product on the right-hand side equals 


equation (*) gives a new way to derive the Diophantus two square identity. 


Replacing each det _ ° in (*) by a* +b? we get 


(aj +.b}) (a3 + b3) = (ajay — yb)” + (ay by + by ay)”. 
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A geometric property of multiplication 


Here is a good place to point out a property of multiplication that we have 
previously observed in special cases in Chapters 6 and 7: multiplication of 
all members of C by some fixed nonzero z, € C is a similarity or shape- 
preserving map, that is, it multiplies all distances by a constant (namely, 
IZo|)- 

This is because the distance between complex numbers z, and z, equals 
IZ, — Z|. When multiplied by z,, z, and z, are sent to z)z, and Zpz,, the 
distance between which is 


IZoZo — %y2;| = ZZ — 2) = Zl |Zo — Z| 


by the multiplicative property of the norm. 

We observed cases of this in Chapter 6, where multiplying Z|i] by some 
B + 0 gave a grid of the same square shape, and in Chapter 7 where mul- 
tiplying Z[,/—2] by B 40 gave a grid of the same rectangular shape. In 
Section 8.4 we use the multiplicative property of the quaternion norm to 
show similarly that any nonzero multiple of the quaternion “integers” is a 
grid of the same shape in IR*. (Here we use the word “grid” rather loosely, 
since the quaternion integers are not simply a grid of 4-dimensional cubes). 


Exercises 


8.1.1 Check that 


M (a, +b,i) +M(a, + byi) = M((a, + bi) + (a) +byi)), 
M(a, +b,i)M(ay + yi) = M((a, +b,i) (ay +byi)) 


Although multiplication by z, leaves the shape of any figure in the plane C 
unaltered, the figure may be rotated. 
8.1.2 How is the amount of rotation related to z,)? 


8.2 Complex matrices and H 


For each pair a, B € C we consider the matrix 


($5) 


which we call a guaternion. 
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The set of quaternions is called H, after Hamilton, who discovered 
them in 1843 (the matrix definition, however, is due to Cayley (1858)). 

It is easy to check that the sum and difference of quaternions are again 
quaternions. So, too, is the product because 


(i a) (fe )=( em) 


Ol, = 04,0 ~ BB), B, = 0% By + Bi OG. 
This can be verified by matrix multiplication and complex conjugation. 


The norm of a quaternion g is defined to be its determinant, hence if 


g= ( a P then norm(gq) is 


where 


B @ 
The multiplicative property of determinants now gives a “complex two 
square identity” similar to the Diophantus two square identity: 


(jo,|? + |B, |") (lo, |? + |B,|7) = 01, 0% — B, B,|° + or, B, + B,@)’. 


This identity was discovered by Gauss around 1820, but he left it un- 
published. 


det ( _ B = 00+ BB =|a\*+|B/. 


Remark on associativity 


It is easy to find quaternions g, and q, such that g,g, # q.q, (exercise). 
In fact, they include the qguaternion units discussed in the next section. 
However, quaternion multiplication is at least associative, 


11 (4293) = (4192)43; 
since matrix multiplication is associative. This property can be checked 


laboriously by computing the matrices on both sides, but it is preferable to 
recall that each matrix represents a function, namely the linear map 


y y y 0 
and that matrix multiplication represents composition of functions. 


Function composition is always associative, simply because f, (f,f;) 
and (f, f,) f, are the same function, since both send X to f, (f,(/(X))). 
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Exercises 


8.2.1 Verify that the product of two quaternions is the matrix claimed. 


The matrix representation of quaternions also shows that a nonzero quaternion 
has a multiplicative inverse, namely its matrix inverse. 


; ; O 7 
8.2.3 Compute the inverse of a nonzero quaternion g = ( B ) and verify 


that g~! is also a quaternion. 


The complex two square identity is one way to derive the four square identity, 
written down in the next section. 


8.2.4 By writing 
a, = a, + di, B, =b, +Cy1, O, =a, +dyl, B, = by + Ci, 


express (a4 +b; +c7 +d7)(a5 +b5 +c5 +d5) as a sum of four squares. 


8.3 The quaternion units 


If we write @ = a+di and B = b+ci, where a,b,c,d € R, then each 
quaternion can be viewed as a linear combination of four special matrices 
1, 1, j, k called guaternion units. 


OL B\ | a+di b+ci 
—-B @ ) \ —b+ci adi 


al) %)\anf 81) A) afi 9 
~F\ g 4 -1 0/ “\ io NO =i 


= d1 + bi+cj+ dk. 


The matrices I, i, J, k are quaternions of norm | that satisfy the following 
easily verified relations: 


P=j =k =-1, 
ij=k = —ji, 
jk =i= —kj, 
ki = j = —ik. 
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Thus the product of quaternions is generally noncommutative: 


14> F QoQ. 


Apart from this, however, the quaternions have the same basic properties as 
numbers. They form an abelian group under addition, the nonzero quater- 
nions form a group under multiplication, and we also have 


qy CE + 43) = 41907 W193) 
CP + 43)q, = G4, + 934, 


(left and right distributive laws). 


The four square identity 


If g = al+bi+ cj+dk then norm(q) is 


det ( a+di b-+ci 


_ 2 2 2 2 
—~b+ci a-—di HOO HOHE, 


Since det(qg, ) det(g,) = det(g,g,), we can also write the “complex two 
square identity” as a real four square identity, which turns out to be 
(ap + bp te, tdi (Qt +4443) = (aya, — bb, — eye, —d\d,)” 
+ (a,b, + bya, + c,d, — dey)” 
+ (ayCq — by dy +44, +d) by)” 
+ (ad, +b,c,—¢,b,+d,a)’. 
Remarkably, the four square identity was discovered by Euler in 1748, 
nearly 100 years before the discovery of quaternions. Euler hoped to use it 
to prove that every natural number is the sum of four squares, by proving 
also that every prime is the sum of four squares. This was first proved 


by Lagrange in 1770. We can now give a simpler proof with the help of 
quaternions. This will be done in the next few sections. 


Exercises 


As mentioned in the previous section, Hamilton did not introduce quaternions as 
particular 2 x 2 matrices. He defined them directly as abstract objects of the form 
al + bi+cj+ dk, with multiplication defined by the following rules: 


i? =j° =k’? =ijk = —1. 
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assume associativity? 


One can similarly find the product of any two units, and then the product of any 
two quaternions. 


8.3.2 Explain the role of the distributive laws in computing products. 
The eight units and their negatives form an interesting finite group. 


8.3.3, Show that QO = {+1, +i, +j, +k} is closed under products and inverses, and 
hence forms a (nonabelian) group under the quaternion product. 


8.3.4 Show that the products of any two of 1, j, k, or their negatives, make up all 
of QO. 


8.3.5 Deduce from Exercise 8.3.4 that any proper subgroup of Q (that is, any 
subgroup that is not all of Q) 1s abelian. 


Q 1s in fact the smallest nonabelian group whose proper subgroups are all abelian. 


8.4 Zii,j,k] 


From now on we write the quaternion I simply as 1 and omit it altogether 
as a term in a product. Thus the typical quaternion will be written 


g=a+bi+cj+dk, where a,b,c,dER. 


Which of these objects should be regarded as “integers’”? 
One’s first thought is that 


Zii,j,k) = {a+ bi+cj+dk:a,b,c,d € Z} 


should be the “quaternion integers”, analogous to the Gaussian integers 
Z{i|. Sum, difference and product of members of Z|i,j,k] are again mem- 
bers of Z/i,j,k], and 


norm(a+ bi+cj+dk) = a? +b? +¢?4+d? 


is an ordinary integer, which we can use to find “primes” in Z|i,j, k]. 
Example. 2-+-i+j+k is a prime of Z/i,j, ki. 


This is so because norm(2+i+j+k) =2*+1°+1°+1* =7, which is 
a prime in Z. Hence 2+i+j+k is not the product of members of Z/i, j,k] 
with smaller norm. 


146 8 The four square theorem 


However, there is trouble when we attempt division with remainder: 
the set of “integer multiples” of a fixed quaternion has the wrong shape. 
Even though multiples are not visualizable as they are in Z|i|—since the 
quaternions 


H= {a+bi+cj+dk:a,b,c,d€R} 


form a 4-dimensional space R*+—-we can nevertheless talk about distance 
and angles in IR* and reason geometrically. 


Multiples in Z|i, j,k) 


We interpret 1, 1, j, K as the unit points on four perpendicular axes in R*. 
Then the quaternion norm a* + b* +c’ +d? is just the square of the distance 
ja+ bi+cj+dk| of a+ bi+cj+dk from O. 

More generally, norm(g, — g,) is the square of the distance |g, — q,| 
between the quaternions gq, and q,. 

Now, since the norm is multiplicative, we have 


199, - 999] = 14(91 — %)| = lallar —% 


? 


so multiplying all of Hl = R* by a quaternion g multiplies all distances by 
the constant |q|. (Since g-0 = 0, multiplication by gq also leaves the origin 
fixed, so when |g| = 1 this operation can be regarded as a “rotation” of R* 
about O.) 

It follows, if g 4 0, that multiplication by g leaves all angles unchanged. 
In particular, the multiples B, Bi, Bj, and Bk of 1, i, j, k by a quaternion 
B # 0 are each at distance |B| from O and in perpendicular directions, like 
1, i, j, k. 

Any multiple of B by an element of Z/i,j,k] is just a sum of elements 
+B, +Bi, +Bj and +Bk. Hence the multiples of B lie at corners of a 
“grid” like the points of Zli,j,k] itself—a grid of what we call 4-cubes. 
The only difference is that the grid of multiples of B is magnified by |B) 
and possibly rotated. 


Exercises 


The rotations of R* obtained by multiplying each point by a quaternion g 4 1 with 
'q| = 1 are unlike rotations of R° in that they have no “axis” of fixed points. 


8.4.1 Show that multiplication by a quaternion g ¥ | fixes only the origin. 
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The detection of quaternion primes by their norms, which are sums of squares, 
allows us to prove already that there are infinitely many of them. 


8.4.2 Without assuming that every natural number is a sum of four squares, show 
that there are infinitely many quaternion primes. 


8.5 The Hurwitz integers 


Division with remainder 


Just as in Z[i], so too in Zli,j,k] we look at the grid of multiples of B to 
find the remainder when a is divided by B. It is a — uB, where uP is the 
nearest corner of the grid. 

Unfortunately, we do not always have |~ — uB| < |B). There is one 
exceptional position: if @ lies at the center of one of the 4-cubes, then 
|ot — UB| = |B |. This is because the center-to-corner distance in any 4-cube 
equals the length of a side. For example, the center 


1,i ik 
2°2°2°2 


2 2 2 1\2 

| - _ _~\ =/J=1. 
)+(2) +) *G) = 

Thus the division property fails for Z|i,j,k|. In this respect, Z]i,j,k) is 
like Z|,/—3] rather than Z/i]. Indeed, we fix the problem exactly as we did 
for Z|,/—3], by adding the exceptional points as extra integers. 

Since each midpoint is obtained by adding ae to some member of 
Z\i,j,k|, we want the set of quaternions of the form 


1+i+j+k 


5 +a+bi+cj+dk for a,b,c,dEZ 


together with those in Z|i, j,k], 
at+bi+cj+dk for a,b,c,d€ Z. 


A single formula that embraces both these sets of points is 


1+i+j+k 
2 


A +Bi+Cjy+Dk for A,B,C,DeE Z. 
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We get the midpoints for A odd and the points of Z/i,j,k] for A even. Thus 
the quaternions we have constructed to ensure the division property are the 
set Z(t i, j,k] of all the integer combinations of 


1+i+j+k 


k. 
y) ? q, Jy 


The quaternions in Z [EE i, j,k) are called the Hurwitz integers, 


after Hurwitz, who introduced them in 1896. We are going to follow his 
idea of using them to prove that every natural number is the sum of four 
integer squares. (This approach may also be found in Hardy and Wright 
(1979) and in Samuel (1970).) 

But first, why should these things be regarded as “integers’’? 


1. The sum and difference of Hurwitz integers are clearly Hurwitz in- 
tegers. 


2. It can be checked (with more difficulty) that the product of Hurwitz 
integers is a Hurwitz integer. 


3. It can also be checked that the norm of a Hurwitz integer is an ordi- 
nary integer. 


This Hurwitz integer has norm 


Pts tl 52, 
4 4A 


. . . . J+ititk - wo 
Since 13 is an ordinary prime, that is not the product of Hurwitz inte- 
gers of smaller norm, hence it is a Hurwitz prime. 


Exercises 


8.5.1 Write each of 1,1,j,k in the form 


1+i+j+k 
2 


A +hi+Cj+Dk for A,B,C,DeE Z, 


and thus show that Z| EEK i, j,k] includes Z/i,j,k). Also show that the 
norm of each Hurwitz integer is an ordinary integer. 
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The units of Z| tES® ijk are the eight units +1,+i,-+j, +k of Z/i,j,k|, 
together with the 16 midpoints +5 + i + J + s nearest to the origin. Like the units 
of Z|i| or Z|¢,|, they form a group, since the set of them is closed under products 
and inverses. However, the group of units of the Hurwitz integers is much more 
interesting, because it is larger and also nonabelian. 

8.5.2 Show that the 24 units listed above include the product of any two of them. 


8.5.3 Deduce from the product calculations in Exercise 8.5.2 that the 24 units 
include the inverse of any one of them. 


8.6 Conjugates 
For any quaternion g = a+ bi+cj+dk we call 
Gg =a-—bi-—cj—dk 


the conjugate of g. This conjugate has almost the same basic properties as 
the complex conjugate: 


qq = \al’, 
N+h=U+h, 
W,- 9. = -M 

N42 =) N- 


(The reversal of the product in the last one is due to noncommutative 
quaternion multiplication.) 

As in C, the properties of conjugation in IHi can be checked by working 
out both sides. We use them (much as we did in Section 6.3) to prove 
a conditional four square theorem: if p is an ordinary prime but not a 
Hurwitz prime then 


p=a+b’+c?+d? where 2a,2b,2c,2d € Z. 
Suppose p has a nontrivial Hurwitz integer factorization 
p= (a+bi+cj+dk)y. 
Then, taking conjugates of both sides, we get 


p= Y(a—bi-cj—dk), since p= p. 
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Multiplying the two expressions for p gives 


p’ = (a+bi+cj+dk)yy(a— bi- cj — dk) 
= (a+ bi+cj+dk)(a—bi—cj—dk)yy since y7/is real 
=(a +b? +c +d°)\y/, 
where both a* + b* + c* +d’, |y|* > 1. But the only positive integer factor- 
ization of p* is pp, hence p = a* +b? +c? +d’. 
Finally, since a,b,c,d are the coefficients of a Hurwitz integer, they 
could be half integers, but at any rate 2a,2b,2c,2d € Z. 0 


Varying the factors of p 


By finding a new factorization of p, we now show that any ordinary prime 
that is not a Hurwitz prime is the sum of four integer squares. 
A Hurwitz integer a with half-integer coordinates can always be writ- 
ten in the form 
a=o+d +b i+cj+dk, 
where a’,b',c',d’ are even integers, by a suitable choice of signs in the 
Hurwitz integer 
— tltitjtk 
a 
The norm of @ is 1, so that @® = 1. 
Now suppose we have p = a* +b? + c* +d’ for an ordinary prime p, 
as in the last subsection, and that a,b,c,d are half integers. We first write 


p= (a+ bi+cj+dk)(a—bi— cj —dk) 
=(o+a'+b'i+cj+dk) x (@4+d —b'ii-c'j—d'k) 
where a’,b’,c’,d’ are even and @ is as above, so @@ = 1. Next we insert 


1 = @@ between the (conjugate) factors just found, and in this way obtain 
new conjugate factors of p, 


p=(@+a+4+bd'i+cj+dk)Ox a(@4+d —b'i-cj—d'k). 


In the first factor, @ plus the even integer terms times @ gives | plus integer 
terms, hence it is 


A+Bi+Cj+ Dk forsome A,B,C,D€ Z. 
The second factor is its conjugate, hence 


p=A°+B+C+D? with A,B,C,D,éZ. oO 
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Exercises 


The proof above shows that any sum of integer squares has a nontrivial quaternion 
integer factorization. 


8.6.1 Find quaternion integer factorizations of the Gaussian primes 3, 7, and 11. 
8.6.2 Why are these Gaussian primes? 


The properties of conjugates enumerated above can be proved using matrices 
or using the multiplication rules for i, j, and k. 


~p 


8.6.4 Use the matrix just found for g to compute g, g,. Hence show gq, g; = 4) q- 


8.6.3 Ifg= ( oO P ) what is ? 


8.7 A prime divisor property 


It was shown in Section 8.5 that Z [AE i, j,k] has the division prop- 
erty, so this enables us to find the gcd of any two Hurwitz integers by the 
Euclidean algorithm. 

However, since the quaternion product is generally noncommutative, 
we must distinguish between right and left divisors and stick to one type. 
We call 6 a right divisor of a if a = yé for some y. 


So if a and B have a common right divisor 6 then 
a= yd, B=ed forsome y,€, 
and therefore 
p=a—uUB = yo —- ped = (y— HE). 


This shows that 6 is also a right divisor of the remainder p when q is 
(right) divided by B. 

Thus if we always divide on the right in the Euclidean algorithm, we 
obtain the greatest common right divisor of a and B. We call it the right 
ged(a, B). 

It then follows, by the usual inspection of terms produced by the Eu- 
clidean algorithm, that 


right gcd(a,B) = wat+vp 


for some Hurwitz integers yi and v. 
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This allows us to prove the following prime divisor property (not the 
full analogue of those for Z and Zi] but strong enough for our purposes): 
if p is a real prime and if p divides a Hurwitz integer product aB, then p 
divides a or p divides B. 

(It helps for p to be real because reals commute with all quaternions, 
hence p is both a nght and left divisor of everything it divides.) 

As usual, the proof begins by assuming that p does not divide o. Then 


1 =right gcd(p,a) = up+va. 


Multiplying both sides on the right by B gives 


Bp =upB+vap. 
Since p divides upB (obviously) and va@B (by assumption), p divides the 
whole right-hand side. Hence p divides B, as required. 0 


8.8 Proof of the four square theorem 


We saw in Section 8.3 that the key to Lagrange’s four square theorem is 
proving that every prime is the sum of four integer squares, since the four 
square identity takes care of all products of primes, that is, all other natural 
numbers except 1 = 07 +07 +07 + 1°. 

The even prime 2 = 0° + 0* + 17+ 1’, so it remains to prove that any 
odd prime p is the sum of four integer squares. We do this with the help of 
the following proposition: if p = 2n+1, then there are l,m € Z such that 
p divides 1+ 1? +m’. 

This is analogous to Lagrange’s lemma in Section 6.5, but easier. Here 
is the proof. 

The squares x*, y* of any two of the numbers / = 0,1,2,..., are in- 
congruent mod p because 


2 


x =y’ (modp) = x—y’=0 (mod p) 
= (x-y)(w+y)=0 (mod p) 
=> x=yorx+y=0 (modp), 


and x + y # 0 (mod p) since 0 < x+y < p. Thus the n+ 1 numbers / = 
0,1,2...,n give n+1 incongruent values of /*, mod p. 
Similarly, the numbers m = 0,1,2...,” give n+ 1 incongruent values 


of m*, hence of —m?, and hence of —1 — m’. 
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But only 2n + 1 incongruent values exist, mod p = 2n + 1. Therefore, 
for some / and m we have 


[?=—1—m* (mod ~). 


That is, p divides 1+ [7 +m’. O 
Four square theorem. Every natural number is the sum of four squares. 


Proof. By the remarks above, it remains to prove the theorem for any odd 
prime p, which we have just shown to divide a number 1 + /? +m’. 

To complete the proof we factorize 1 + /* + m* into the product of Hur- 
witz integers 

(1 —li-— mj) +li+ mj) 

and apply the prime divisor property from last section. If p is a Hurwitz 
prime, then p divides 1 — /i— mj or p divides 1+ /i+ mj. But neither 
conclusion is true, because neither 


is a Hurwitz integer. Hence our arbitrary odd prime p is not a Hurwitz 
prime, and therefore, by Section 8.6, 


p=A°+B°+C°+D* with A,B,C,D,é Z, 


as required for the four square theorem. 0 


Exercises 


It follows from the four square theorem that any natural number has a Hurwitz 
integer factorization. 


8.8.1 Explain why. (Does it matter if some of the squares are zero’) 


Thus it is no longer any surprise that some real Gaussian primes are not Hurwitz 
primes—none of them are. However, we can still ask about the proper complex 
Gaussian primes a+ bi with a,b £0. 


8.8.2 Explain why the quaternions of the form a+ bi, for a,b © IR, can be identi- 
fied with the complex numbers a+ b\/—1. 


8.8.3 Show that a proper Gaussian prime a+ bi is also a Hurwitz prime. 
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So far, we have said nothing about sums of three squares because their story 
is not so complete or elegant. For a start, there is no three square identity because 
a sum of three squares times a sum of three squares is not always a sum of three 
squares. 


8.8.4 Find the natural numbers less than 20 that are not sums of three squares, 
and hence find one that factorizes into two sums of three squares. 


8.8.5 Work out the possible values of x* mod 8, and hence show that no natural 
number of the form 87+ 7 is a sum of three squares. 


With a little more work we can prove the more general result that no natural num- 
ber of the form 4’"(8n +7) is a sum of three squares. 


8.8.6 By considering the values of squares mod 4 show that 
v+ty+27=0 (mod 4) 


is possible only when x, y, z are all even. 


8.8.7 Deduce from Exercise 8.8.6 that if 4’"(8n +7) is a sum of three squares, 
then so is 4”! (8n+7). 


8.8.8 Exercises 8.8.7 and 8.8.5 imply that no natural number 4’”(8n +7) is a sum 
of three squares. Why’ 


The happy ending to this story is that the numbers 4’"(8n +7) are precisely 
those that are not sums of three squares. This was first proved by Legendre, and 
a proof may be found in Mordell (1969), pp. 175-178. As Mordell remarks “no 
really elementary treatment is known”. 


8.9 Discussion 


Hurwitz’ application of quaternions to the four square theorem was a his- 
torically natural event, very much like Dedekind’s application of Gaussian 
integers to the two square theorem. In both cases a sum-of-squares identity 
was discovered first, followed considerably later by the discovery of gen- 
eralized numbers with a multiplicative norm (the multiplicative property 
being just a restatement of the sum-of-squares identity). Finally, appropri- 
ate “integers” and “primes” among the generalized numbers are found to 
explain the representation of ordinary integers as sums of squares. 

The historical parallel between the complex numbers C and the quater- 
nions [Hi is even stronger than this, because both stories have a similar miss- 
ing link I have not yet mentioned. The discovery of the sum of squares 
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identity led to the creation of the generalized numbers via an algebraic 
analysis of rotations. 
In the case of complex numbers the story is briefly this. 


e Diophantus (around 200 CE) observed the identity 
(aj + bf) (a3 +3) = (aay — by by)” + (ay by + Dyan)” 


and interpreted it as a rule for taking two right-angled triangles, with 
side pairs (a,,b,) and (a,,b,), and generating a third triangle, with 
side pair (a,a, -b,b,,a,b, +b,a,), whose hypotenuse is the product 
of those in the triangles (a,,b,) and (a,,b,). 


Viete (1593) noticed that the angle in the third triangle is the sum of 
the angles in the first two. In our notation, this is because the ratio 
of sides in the third triangle is 

| Pig Po 


a,d,—b,b _ by bg 
1¢2 142 1 a, a, 


= tan(@, + 6), 


_ 1 by 


— are the angles in the first two. 
2 


where 0, = tan‘ — and 0, = tan 
ay 


In the 18th century Cotes, de Moivre, and others rediscovered the 
angle-addition property by formally multiplying cos 6 + isin@ and 
cos@ + ising to obtain cos(@ + @) +isin(@ +). Multiplication 
of complex numbers of norm | therefore represents rotation of the 
plane about the origin. This, and the more obvious interpretation 
of addition as vector addition, led to the identification of complex 
numbers with points of the plane by Wessel (1797), Argand (1806), 
and (with more authority) by Gauss. 


Hamilton (1835) defined complex numbers to be pairs (a,b) of real 
numbers with addition and multiplication defined by 


(a,,b,) + (dy, by) = (Gy +4, b, +5,) 


(a,,b,) X (a, b5) = (aya, — bby, a,b, + ba). 


Of course, Hamilton in 1835 was operating with 20/20 hindsight about 
the complex numbers, so he knew that these definitions of addition and 
multiplication would have all the usual algebraic properties, and that the 
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function norm((a,b)) = a* + b* would be multiplicative. However, he 
hoped that by rewriting the history of complex numbers in this way he 
would see how to multiply triples. He hoped in fact to find a multiplication 
rule for n-tuples that made their norm multiplicative, where 


norm((d,,45,---;dn)) =a, tay te +a. 


But a multiplicative norm for n-tuples implies a sum of 1 squares identity, 
so it would have been wise to look for a sum of three squares identity first. 

This did not happen. Instead, Hamilton spent 13 years trying in vain 
to find a multiplication rule for triples. Virtually all that he learned from 
his search was that the commutative law of multiplication might have to be 
abandoned. When he also abandoned triples, and tried quadruples, every- 
thing fell into place. On October 16 1843 he wrote down the rules 


that define quaternion multiplication and from them derived the four square 
identity. Only then did he start to catch up on the news—that Euler knew 
the four square identity in 1748, that Legendre knew that there is no three 
square identity, and that quaternion multiplication had already been used 
by Rodrigues in 1840 to compute the product of rotations in R°. 

Of course, these earlier discoveries were mere glimpses of the complete 
and beautiful structure discovered by Hamilton. The quaternions are even 
more remarkable than he knew, because after his death it was shown that 
“multiplying n-tuples” is possible only for n = 1,2,4, and 8. To be precise, 
these are the only m for which R” has a multiplication that distributes over 
vector addition, and a multiplicative norm. A related result, due to Hurwitz, 
is that an 7 square identity exists only form = 1,2,4, and 8. 

For n = 1,2,4 the corresponding structures are IR, C, Hl, and for n = 8 
the corresponding structure is called the octonions. It was discovered by 
Hamilton’s friend John Graves, just months after the discovery of quater- 
nions, and is based on an eight square identity. Like the quaternions, the 
octonions do not have commutative multiplication; their multiplication 1s 
not associative either. More on these generalized number systems may be 
found in the excellent book Numbers by Ebbinghaus et al. (1991). 

Like the quaternions themselves, the Hurwitz integers had an interest- 
ing precursor in geometry. In 1852 Schlafli discovered that there are two 
exceptional dimensions n for which IR” can be “tiled” by regular figures 
other than cubes. They are n = 2, where the exceptional tilings are by 
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equilateral triangles or by regular hexagons, and n = 4. In R* = C the two 
exceptional tilings can both be derived from the Eisenstein integers Z|¢,). 
The triangle tiling is obtained by joining each integer point to its nearest 
neighbors and the hexagon tiling by taking each integer point as the center 
of a region whose sides are midway between neighboring integer points. 
In R* the two exceptional tilings are obtained in the same way, from none 
other than the Hurwitz integers. For more on these remarkable tilings, see 
Coxeter (1948). 


PREVIEW 


Fermat’s remarkable discovery that odd primes of the quadratic form 
x* +-y? are in fact those of the linear form 4n + 1 led to the more gen- 
eral problem of describing primes of the form x* +dy* for nonsquare 
integers d. Is it true, for each d, that the primes of the form x7 + dy 
are those of a finite number of linear forms? 


Fermat found such forms for the primes x* + 2y* and x* + 3y* as 
well. In each case a crucial step in determining the linear forms of 
the primes x” + dy” is to find the guadratic character of —d, that is, 
to find the primes g such that —d is a square, mod g. 


The law of quadratic reciprocity answers all such questions. The 
law describes when p is a square, mod g, for odd primes p and gq; 
and its supplements deal with the cases p = —1 and p = 2. 


To prove it, we first prove Euler’s criterion, which states that p is 
g—-1 

a square mod g & pr = 1 (mod gq). This yields the supplements 

fairly easily, and it also helps in the proof of the law itself. 


We also need the Chinese remainder theorem. It is of interest in 
itself and for what it says about the Euler @ function, but our main 
purpose is to use it to prove quadratic reciprocity for odd primes p 
and q. 


To discuss quadratic reciprocity tersely we use Legendre’s symbol 


(2), which equals | when P is a square mod g, and —1 otherwise. 


All values of (2) follow from the values (2) for odd primes p 
(from quadratic reciprocity) and the special values (=) and (2) 


(from the supplements). 
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9.1 Primes x* + y’, x7 + 2y", and x7 + 3y? 


The primes x° + y* again 


In the proof of Fermat’s two square theorem, that an odd prime p is of the 
form x* + y? if and only if p is of the form 4n + 1, a key step was showing 
that any prime p = 4n-+1 divides a number of the form m* + 1. We proved 
this in Section 6.5 using Wilson’s theorem to construct a suitable m. 

We now re-examine this step to see how it might be generalized. The 
statement that p divides m* + 1 is equivalent to 


—{=nm- 


(mod p), 
in other words —1 is a square, mod p= 4n+ 1. And indeed our proof was 
to take the expression for —1 given by Wilson’s theorem and show that it 
was in fact a square, mod p = 4n+ I. 

This raises the general question of whether g is a square mod p, where 
p and g are arbitrary integers. We also state the question as: what is the 
quadratic character of g, mod p? Several problems lead to this question, 
as we now show. 


The form x* + 2y* 


After describing the primes of the form x* + y*, Fermat tackled primes of 
the form x* + 2y*. He claimed that 


pHxrt+2y os p=8n4+1 or p = 8n+3. 


A proof can be given along the same lines as our proof of the two square 
theorem. 

We work in Z/,/—2], and prove first that if p is an ordinary prime that 
is not a prime of Z|,\/—2] then 


p= a’ + 2b for some a,b © Z. 


The proof is like that for non-Gaussian primes (Section 6.3). [If p is not a 
prime of Z|\/—2| then it has factors of norm > 1, say 


p=(at+bvV—2)y. 


Multiplying this equation by its conjugate leads to p = a* + 2b’. 
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The key step now is to prove that any prime p = 8n+ 1 or 8n+3 divides 
a number of the form m* +2. Once this is done, one uses the factorization 


nv? +2 = (m—V/—2)(m+ V—2) 


and unique prime factorization in Z|,/—2] to complete the proof as we did 
for Zi) (Section 6.5). 
The claim that p divides a number of the form m* + 2 is equivalent to 


—2=m (mod p). 


Thus we have to prove that —2 is a square, mod p, when p is a prime of 
the form 8n+ 1 or 8n+ 3. 


The form x° + 3y" 


Next, Fermat described the primes of the form x? + 3y’: 
_ 424, 3,2 _ 2 
pH=x+3y S&S p=3n+l. 


This can be proved along the same lines as for x* + y* and x” + 2y’, this 
time using factorizations in Z[ 3), The awkward step is to prove that 
any prime p = 3n+ 1 divides a number of the form m? + 3. Equivalently, 


2 


—3=m (mod p), 


sO we now have to prove that —3 is a square, mod p= 3n+ 1. 


Exercises 


In a letter to Frenicle on 15 June 1641, Fermat asked which natural numbers are 
the sums of the two smaller members of a Pythagorean triple. These are the num- 
bers of the form 2X ¥ + X*—Y* = (X +Y)* —2Y?, and Frenicle correctly replied 
that the primes among the numbers x* — 2y? are precisely those of the form 8n +1. 
This can be proved in the same way as Fermat’s results about x7 + y*, x7 + 2y’, 
x* +3y? by using 

e conjugation in Z[V/2], 

e the quadratic character of 2, 

e unique prime factorization in Z[ V2]. 


The quadratic character of 2 will be established in Section 9.4, but the remaining 
steps can be done here. 
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where a+ by/2 and y have norm of absolute value greater than 1. By taking 
conjugates of both sides, show that p = a* — 2b”. 


9.1.2 Using the fact that x? =0, 1,4 (mod 8) show that all odd primes of the form 
x* —2y? are of the form 8n + 1. 

One now uses the quadratic character of 2 (see Section 9.4) to prove that any 
prime of the form 8n-£ | divides a natural number of the form m* — 2. Assuming 
also a prime divisor property in Z/[/2), the argument continues as follows. 

9.1.3 Show that, if p divides m* —2 = (m— /2)(m+ V2), then p is not a prime 
of Z|\/2]. (Hence p is of the form x* — 2y* by Exercise 9.1.1.) 

The only information now missing, apart from the quadratic character of 2, is 
a proof that Z[\/2] has the prime divisor property. This is obtained by showing 
that Z[\/2) has the division property, and hence a Euclidean algorithm. 

The norm a* — 2b? of a+ bv/2 € Z[V2] is not |a+b/2'*, so the geomet- 
ric argument used for Zi] and Z/,/—2] does not apply, and we opt for a purely 
algebraic approach. First we state the division property of Z[/2] as follows. 


If a, B € Z{V/2] and B #0, then there are u,p € Z[|V2| with 
a=uB+p and |norm(p)| < |norm(B)|. 


9.1.4 Show that the division property is implied by the existence of a p € Z/V/2! 
with norm (F — u) < 1. (We are now extending the norm to Q(/2). Is 
this OK?) 

9.1.5 If a,B € Z/V/2] and B ¢ 0, show by “rationalizing the denominator” that 


O A, A, . 
— = tf 4/2 for some A,,A, € Z. 
B  norm(B) r norm(f ) v2 for some A.A, 


9.1.6 Continuing with the notation of Exercise 9.1.5, if 


A. A 
: i oy . 2 
m, = nearest integer to ————--—-, _ m, = nearest integer to 
! . norm(B)’ =? . norm(B)’ 
and Lb = m, +m,V/2, show that ‘norm (¢ _ “)| <1, so Z[,/2] has the 


division property. 


9.2 Statement of quadratic reciprocity 


In the mid-18th century Euler realized that knowing the primes of forms 
such as x7 + y*, x7 + 2y* and x* + 3y* depends on knowing whether p is a 
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square mod g, for certain integers p and g. In the case where p and g are 
both odd primes he conjectured that the answer is: 


When p and gq are both of the form 4n-+ 3, then 
pis asquare, mod g < gq is not a square, mod p. 
Otherwise 
pis asquare, mod g = q is a square, mod p. 


Because of the reciprocal relationship between p and g, this statement 
is called the law of quadratic reciprocity. (The word “quadratic” in this 
case really means “square”. In the literature one often finds the old term 
“quadratic residues mod p” for “squares mod p’’.) 

Euler was unable to prove the law of quadratic reciprocity. The first 
proofs were given by Gauss in 1801. Since then nearly 200 different proofs 
have been given, making quadratic reciprocity the second most proved the- 
orem in mathematics, after Pythagoras’ theorem. 


Notation and examples 


In Section 9.8 we give a recent proof of quadratic reciprocity, which simpli- 
fies one of Gauss. But first we introduce some notation and give examples 
of its use. 

For any primes p and g the Legendre symbol or quadratic character 
symbol is defined by 


P\ _j 1 if pis asquare mod qg 
g/ \ —1 if pis not a square mod q 


With the help of this symbol, quadratic reciprocity can be stated very con- 


cisely as 
DP qd poi q-l 
_ — | —{—] 2 
a) Ga 


for odd primes p and g. 


The Legendre symbol may be extended to (2) for any integer P, and 
it is +1 according as P is a square mod g or not. This is possible by the 
multiplicative property 


a) ) = GE 
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where p,p,---p, is the prime factorization of P (possibly including the 
prime 2 and the unit —1). To evaluate the possible factors (=) and (2) 


arising from this prime factorization, we need the so-called supplements to 
quadratic reciprocity, 


(=) =loq=4n+l, (1) 
2 
(=) -[6g=8ntl. (iD 


These, and the multiplicative property, are proved in the next few sections. 
We now use them to prove properties of squares sought in Section 9.1. 


Examples 


To show that —2 is a square mod p = 8n+ 1 or 8n +3 we calculate 


Gs a = Cs | Cs 7 by the multiplicative property 


=1x1=1_ by supplements (D and (ID. 


7 a ! 7 by the multiplicative propert 
= . u ah V TY 
8n +3 8i+3/\8n4+3) > P paspeny 


= (—1)x(—1)=1 by supplements (D and dD. 


To show that —3 is a square mod p = 3n+ 1: 


= - : by the multiplicative propert 
a TI > multipl V rt 
3n+1 3n+1 3n+1 y P property 


ay es or (<1) x (<1) (ae), 


by supplement (1) and quadratic reciprocity, 


with the + or — sign according as 3n + 1 is of the form 4n’ +1 or not, 


1 1 
—1~x (5) or (—1) x (-1) (5) since 3n + 1 = 1 (mod 3) 


= 1x 1or(—1)x(—1)=1, _ since 1 is a square mod 3. 
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Exercises 


Note that the quadratic character of 2, given by supplement (1D), is precisely what 
we need to fill the gap in the last exercise set, proving that the odd primes of the 
form x* —2y* are those of the form 8m 1. We now use the quadratic character of 
3 in a similar way to characterize the primes of the form x* — 3y”. 


9.2.1 Suppose p is an ordinary prime but not prime in Z[/3], so p = (a+bvV3)y, 
where a-+ bv/3 and y have norm of absolute value greater than |. By taking 
conjugates of both sides, show that p = a* — 3b”. 


9.2.2 Use congruence mod 12 to show that all odd primes of the form x7 — 3y? 
are of the form 12n+ 1. 


9.2.3 Use quadratic reciprocity to show that 3 is a square mod any prime p = 


12n +1, and hence that such a p divides a natural number of the form 


m —3. 


9.2.4 Check that the argument of Exercise 9.1.6 also works for Z[\/3], so Z[V3] 
has the prime divisor property. 


9.2.5 Use Exercises 9.2.3 and 9.2.4, and m* —3 = (m— V/3)(m+ V3), to prove 
that p = 12n-+ 1 is not a prime of Z\V/3). Conclude, by Exercise 9.2.1, that 
p is of the form x* — 3y”. 


Thus the form x* — 3y* represents all the odd primes of the form 12n+ 1. It 
does not represent the even prime 2, as this would contradict the quadratic char- 
acter of 2. 


modulo a prime not allowed by supplement (II). 
Likewise, the quadratic character of —1 gives another solution of Exercise 5.8.6. 


9.2.7 Show that an integer solution of x* — 3y* = —1 implies that —1 is a square 
modulo a prime not allowed by supplement (1). 


9.3 Euler’s criterion 


If g is prime and a 4 0 (mod g), then a?! = 1 (mod gq) by Fermat’s little 
theorem. Euler used this to derive the following formula: 


Kuler’s criterion. For an odd prime 4g, (2) = pr (mod q), and hence p 


is a Square mod gq <=> pr = | (mod q). 


9.3. Euler’s criterion 165 


Proof. First suppose that p is a square mod g, say p = a~ (mod gq). Then 
(2) = | by definition and 


yl 
p? =qi =| (mod g) by Fermat’s little theorem. 


Conversely, if p is not a square mod g, it suffices to show that 


pz #1 (mod q). 


g—-1 


This is so because x = p 2? satisfies x* = p?~' = 1 (mod gq) by Fermat’s 
little theorem, and x* = 1 has only the two solutions x = +1 by Lagrange’s 
polynomial congruence theorem. 

By the same theorem, p a (mod q) has at most 4— + solutions, and 


we know that they include the squares p = 1°,27,..., (a3) . These at 


squares are distinct. Indeed, if x* and y* are any two of them we have 


r= y (mod g) > x 2_y=0 (mod gq) 


=> (x—y)(x+y)=0 (mod q) 
ay 


This is so because | < HY < g and hence x + y £0 (mod g). 
Thus when p ¥ a* (mod q) we have p “+ = | (mod gq) and therefore 


qt 1 


pr =- 1=(2) (mod q). 7 


Notice that the proof of this criterion does not assume that p is actually 
prime. We take this opportunity to define (Z ), for any P £0 (mod g), to 


be 1 if P is a square mod g and —1 otherwise. Then the Euler criterion 
gives an easy proof of the following. 


Multiplicative property of (2) . For any P,,P, #0 (mod q) 


Gi) = CP): 
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and therefore 


P P g-1 — q-l 
(=) (=) = pr Pz (mod g) by multiplication of congruences 
d qd 


feed 


q- 


=(P,P,) = (mod q) 


P,P. 
= (A | (mod g) by Euler’s criterion again. 0 
q 


The proof of the multiplicative property also does not assume that the 


integers P are prime. So we can evaluate (2) for any integer P, provided 


we know (2) for factors p of P. We can assume that the factors are among 


—1], 2 and the odd primes. 
The law of quadratic reciprocity (proved in Section 9.8) gives infor- 


mation about (2) for odd primes p, so we also need information about 
(=) and (2). We obtain this from the supplements to quadratic reci- 
procity, proved here and in the next section, which give the values of (=) 
and (2) directly. (We previously gave another determination of (=) in 
Section 6.7.) 


The value of (=) . For an odd prime q 


g ) \ —-lifq=4n+3. 


Proof. Euler’s criterion says that 


and if g = 4n+3 we have 


(=) =5 (—1)°""! =-—1 (mod q). 0 
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Exercises 


If one assumes the existence of a primitive root for a prime g (proved in Section 
3.9), then it is possible to give a simpler proof of Euler’s criterion. 


9.3.1 If ais a primitive root for g, so that 1,a,a’,...,a?~ are the distinct nonzero 
elements mod g, show that the squares mod g are 1,a*,a’*,...,a4 


9.3.2 Deduce from Exercise 9.3.1 that b is a square, mod g © bis =1 (mod q). 


Another easy consequence of Exercise 9.3.1 is a “half and half” property of 
squares mod g. 


9.3.3 Show that exactly half of 1,2,3,...,qg— 1 are squares mod g. 


The “half and half” property can also be proved without assuming the exis- 
tence of a primitive root, though not quite so easily. The proof of Euler’s criterion 
shows that at least half of 1,2,...,q@— 1 are squares. Then it suffices to prove the 
following: 


9.3.4 Show that 17,27,3°,...,(q¢—1)* include at most half of the numbers 4 0 
(mod q). 


9.4 The value of (2) 


1 -1. 
Euler’s criterion says that (2) = 2°> (mod , but 2°r is harder to eval- 
y q q 


uate than (—1)2. Fermat seems to have known (2) (see exercises to 


Section 9.1), but we do not know how. It turns out that 


q-l 


nts! (—1)F (modgq) ifg=4n+1 
(-1)"F (mod q) ifg=4n+3. 


We can prove this by manipulating the product | x 2x 3 x --- x (¢—1) 
(mod gq), a little like the manipulation in Section 6.5 that yielded the quadratic 
character of —1 in Section 6.7. 

When g = 4n + 1 the manipulation takes 2 out of half of the factors, —1 
out of one quarter of them, and rewrites the resulting negative factors mod 
4n +1 to make them positive. This restores the product we started with, 
which can then be cancelled from both sides. 
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1_x2xK---x4n 


=(1x3x---x (4n—1))x (2x 4x--- x 4n) (mod q) 
=(1x3x--+x (4n—1))x (1K 2x +++ x 2n)2" (mod q) 
=(1x3x--- x (2n—1)) x ((2n+1)(2n+3)---(4n—1)) 
x (1x 2x« +++ 2n)2*" (mod q) 
= ((—1)(—3)---(—2n + 1))(—1)" x ((2n + 1)(2n +3) ---(4n—1)) 
x (1x 2x« +++ x 2n)2*" (mod q) 
= ((4n)(4n —2)---(2n+2))(—1)" x ((2n +1)(2n+3)-+-(4n—1)) 
x (1x 2x«---x 2n)2°" (mod g) 
since —1=4n, —3 =4n—-2,... (mod gq) 
=((2n+1)(2n+2)-+-(4n))(—1)" x (1x 2x +++ x 2n)2*” (mod q) 
== (—1)"2°"(1 x 2x +++ x 4) (mod q) 


Cancelling 1 x 2 x --- x 4n from the first and last line, we get 


that 1s, 
g-1 


2*r =(-1)"7 (modg), wheng=4n-+1. 


There is a similar proof (exercises) that 


g-l gtl 


22 =(-1)* (modg), wheng=4n+3. 


To decide when 2 is a square mod g we therefore have to look at two 
cases: 

Ifg=4n+1, qt =n. So (2) = (—1)"r is 1 when n = 2m and —1 
when n = 2m-+ 1. That is, 2 is a square mod g when g = 8m-+ 1, not when 
g = 8m+5. 

If g=4n+3, Gt =n+1. So (2) = (—1)*F is 1 whenn=2m+1 
and —1 when n = 2m. That is, 2 is a square mod g when g = 8m-+7, not 
when g = 8m-+3. 


To sum up: 2 is a square mod g @q = 8&m+ 1. 
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Exercises 
The proof for g = 4n +3 splits the product | x 2 x--- x 4n x (4n+ 1) x (4n4+ 2) 


in a slightly less regular way—necessarily, because the number of terms is not 


9.4.1 By partitioning 10! =1x2«x3x4x5x6x7x8&x9x 10as 


(1x3x5)(7x9)(2x4x6x8 x 10) 


a 
/ 


removing —1 and 2 from appropriate factors, and then changing negative 
factors —k back to positive 11 —k, show that 


10! = 10!(—1)°2°. (mod 11), 
and hence that (+) = —|, 
9.4.2 By partitioning (4n+2)!=1x2x*3x-+--x4nx (4n+1) x (4n+2) as 


(1x 3x«5x-+-x (2n+1))((2n+3) x (2n+5) x--- x (4n+1)) 
x (2K 4K 6x---x4nx (4n4+2)), 


removing —1 and 2 from appropriate factors, and then changing negative 
factors —k back to positive 4n + 3 — k, show that 


(4n +2)! = (4n4+2)'"(—1)"*!27"*! (mod 4n +3), 


qth 


9.4.3 Deduce from Exercise 9.4.2 that (2) =(—1) * when g = 4n+3. 


9.5 The story so far 


In Section 9.1 we observed that classifying primes of the forms x* + y’, 
x* + 2y’, x* + 3y* depends on knowing that certain numbers are squares 
mod certain primes. To prove such results we introduced the Legendre 
symbol, defined for any integer P = 0 (mod gq) and odd primes g by 


P\ _ J lif Pis asquare, mod q 
g/ | —lif P is not a square, mod g. 


Thanks to the Euler criterion, 
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valid for any P = 0 (mod gq), we can prove the multiplicative property 


®(8)-CP) 


and hence find (2) for any P £ 0 (mod gq) by splitting P into factors 


P1;P>;--- that are either —1 or primes, then multiplying (2) ; (22) bene 


In Sections 9.3 and 9.4 we used the Euler criterion to prove the supple- 
ments to quadratic reciprocity: 


(<1 q=ane (1) 
2 
(=) =l1oq=8nHtl. (iD) 


Thus it remains to evaluate (2) for odd primes p and g. This is done by 


the quadratic reciprocity law, which is proved in Section 9.8: 


) (=m 


Using quadratic reciprocity 


Quadratic reciprocity says that 


(2 — (4) if one of p,g = 4n+1, 
q Pp 
(2) —— (4) otherwise. 

q Pp 


Another point to bear in mind is that if p = p’ (mod gq), then 
p is asquare mod g > p’ is a square mod g. 


Thus we can replace p in (2) by its remainder p’ on division by g. One 


then “reciprocates” (=) to + (4) by quadratic reciprocity, replaces g by 
its remainder g’ on division by p’, and so on. In effect, one interweaves the 
Euclidean algorithm with applications of multiplicativity to rapidly reduce 
the numbers to the point where supplements I and II can be applied. 
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Example. Decide whether 37 is a square mod 59. 


ST\ (22 by reciprocit 
59) ~ \37) oy erp 
22 
— (5) by remaindering 
2 1] b Itiplicativit 
= {—]/|— multiplicativi 
37) 37) OF Peas» 
a b | t dD) 
= — | — DY § men 
37 y suppleme 
37 . 
= — (7) by reciprocity 
4 . . 
= — (= by remaindering 
>\2 
——— (= by multiplicativity 
=-—|] 


Hence 37 is not a square mod 59. 


Exercises 
9.5.1 Show that (23) = | by using multiplicativity and the Euclidean algorithm. 
9.5.2 Verify directly that (23) = 1 by finding a square = 55 (mod 89). 
: 56\ __ 
9.5.3 Show that (33) =—l. 


9.6 The Chimese remainder theorem 


An example 


The Chinese remainder theorem is about representing numbers by their 
remainders. For example, here are the numbers n = 0,1,2,..., 14 and their 
remainders n mod 3 and n mod 5 on division by 3 and 5 respectively. 


i2 13 14 
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It can be checked that each of the 15 numbers n = 0,1,2...,14 has a 
different pair of remainders, and hence each such n is determined by its pair 


of remainders. For example, the only number with the pair of remainders 
(2,3) is 8. 


It is also easy to see why this is true. 


® The first component of each pair, n mod 3, runs through the sequence 
012012..., which repeats every three steps. 


e The second component, n mod 5, runs through 0123401234. .., which 
repeats every five steps. 


e Therefore, no pair is repeated until after lcom(3,5) = 15 steps, and 
hence the first 15 pairs are different. 


Classical Chinese remainder theorem 


The original form of the remainder theorem, found in China around 300 
CE, goes as follows. If gcd(a,b) = 1, then each n = 0,1,2,...,ab—1 has 
a distinct pair of remainders on division by a and b. 

This can be proved by a generalization of the argument above. 


e The first remainder of each pair, n mod a, runs through the sequence 
012...(a—1)012...(a—1)... 
which repeats every a steps. 
e The second remainder, n mod b, runs through the sequence 
012...(b—1)012...(b—1)... 
which repeats every b steps. 


e Therefore, no remainder pair is repeated until after lom(a,b) = ab 
steps, and hence the first ab pairs are different. O 


The condition gcd(a, b) = 1 says that a and b have no common prime fac- 
tor, so their common multiples include all their prime factors, and hence 
Icm(a,b) = ab. 
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Exercises 


The classical example of a Chinese remainder problem is in the Mathematical 
Manual of Sun Zi, late in the 3rd century CE. It is required to find a number that 
leaves remainder 2 on division by 3, remainder 3 on division by 5, and remainder 
2 on division by 7. 


9.6.1 Show that the numbers 1,2,3,...,210 all leave distinct triples of remainders 
on division by 3, 5, 7 respectively. 


9.6.2 Find a generalization of this result to triples of remainders on division by a, 
b, c, with suitable conditions on the moduli a, b, c. 


9.6.3 Find the minimal solution of Sun Zi’s problem. 


9.6.4 Describe the numbers with remainder | on division by 3 and remainder 2 on 
division by 5, and hence find the least of them with remainder 3 on division 
by 7. 


9.7 The full Chinese remainder theorem 


The modern form of the theorem not only represents each of the numbers 
n=0,1,2,...ab—1 by a pair (n mod a,n mod b), it also recognizes that 
these n can be added and multiplied by adding and multiplying the corre- 
sponding pairs. 

The first components of pairs are, naturally, added or multiplied mod 
a; the second components are added or multiplied mod b, so we speak of 
pairs being congruent (mod a, mod b). 


Example. Adding and multiplying mod 15. 


Consider 8 and 9, and their sum and product mod 15. We have 


8 represented by (2,3) 
9 represented by (0,4). 


Adding these pairs mod 3 in the first component and mod 5 in the second, 
we get 


(2,3) + (0,4) = (2+0,3-+4) = (2,7) 
= (2,2) (mod 3, mod 5). 


(2,2) 1s the pair that represents 2, and indeed 8 + 9 = 2 (mod 15). 
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Similarly, if we multiply (2,3) and (0,4), mod 3 in the first component 
and mod 5 in the second, we get 
(2,3) x (0,4) = (2x 0,3 x 4) = (0,12) 
= (0,2) (mod 3, mod 5). 
(0,2) is the pair that represents 12, and indeed 8 x 9 = 12 (mod 15). 
The full Chinese remainder theorem says that the pair (m mod a,m mod b) 


corresponds to m, mod ab, and that 


(m mod a,m mod b) + (n mod a,n mod b) 


=(m+nmoda,m+nmodb) (mod a,mod b) 
and 
(mm mod a,m mod b) x (n mod a,n mod b) 
= (mn mod a,mn mod b) (mod a, mod b) 
This follows easily from addition and multiplication of congruences. 


mmoda is =m (moda), 


nmoda is =n (moda), 
and therefore, by addition of congruences, 
(mmoda)+(nmoda) is =m+n (moda). 


Similarly for addition mod b, and for multiplication mod a and mod b. 


This version of the theorem shows that the pairs (n mod a,n mod b) 
not only correspond 1-to-1 to numbers n (mod ab) (we need gcd(a,b) = 1 
for this part), but also behave the same under + and x (mod a, mod b). 


Invertible elements 


The modern Chinese remainder theorem gives a very clear picture of the 
group (Z/abZ)* of invertible elements under multiplication mod ab. 

As we have just seen, when gcd(a,b) = 1, n behaves (mod ab) as the 
pair (n mod a,n mod b) does (mod a, mod b). In particular, n has an in- 
verse, (mod ab), if and only if n mod a has an inverse (mod a) and n mod 
b has an inverse (mod b). 
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Example. Invertible elements, mod 15. 


These are the n for which the remainder pairs (n mod 3,n mod 5) have 
inverses (mod 3, mod 5). Since 3 and 5 are primes, these are precisely the 
pairs in which n mod 3 and n mod 5 are nonzero. 

There are two nonzero elements mod 3 (namely 1 and 2) and four 
nonzero elements mod 5 (namely 1, 2, 3 and 4), hence there are 


2x4=8 


pairs (n mod 3,n mod 4) of nonzero elements, and hence eight invertible 
elements n (mod 15). They can be read off the table in Section 9.6 as the 
numbers 1, 2, 4, 7, 8, 11, 13 ,14—tthose for which the corresponding pairs 
have no zeros. 


This example generalizes to a key theorem about the @ function. 
Multiplicative property of @. When gcd(a,b) = 1, o(ab) = p(a)@(b). 
Proof. By the criterion for inverses in Section 3.6, there are @(a) invert- 
ible elements (mod a) and ((b) invertible elements (mod b). Therefore, if 
gcd(a,b) = 1 there are @(a)@(b) invertible pairs (n mod a,n mod b); that 
is, @(a)@(b) invertible elements (mod ab). But the number of invertible 
elements (mod ab) is (ab). Hence if gcd(a,b) = 1, we have 


p(ab) = p(a)o(b) 0 


Exercises 


Thanks to the multiplicative property, we can now complete our search for an 
explicit formula for p(n), begun in the exercises to Section 3.6. 


9.7.1 Using Exercise 3.6.3, show that p(n) = n(1— a) “(1 — a where p,, Po, 
...,p, are the distinct prime divisors of n. | 


9.7.2 Use the formula to show that p(60) = 16. 


9.8 Proof of quadratic reciprocity 
A formula for (2) and (4) 


We now give a formula that simultaneously exhibits (2) and @ ina 


product of pairs (mod p, mod gq). This formula is used to prove quadratic 
reciprocity below, following the argument of Rousseau (1991). 
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When p and g are different odd primes we consider the invertible num- 
bers mod pg, which are those divisible by neither p nor g. The invert- 
ible x in the range 1 <x < Pa taken mod p, consist of at sequences 


1,2,...,p— 1 and the “half sequence” 1,2.,..., Be , minus the Pe multi- 
ples g,2q,... Pong of g in this range. 
The mod p product of these invertible x 1s therefore 


since (23+) ! cancels, (p— 1)! =—1 by Wilson’s theorem, and gt = (4) 


by Euler’s criterion. Similarly, the mod g product of these invertible x is 


invertible x p=! [ Pp 
LJ g 


I<x< Pat 


and therefore the (mod p, mod q) product of pairs (x,x) is 


invertible Xx wi {gq pol [ p 
Tx) =((-* (4). (2) tonod p, moa oy «0 


Completion of the proof 


Now we evaluate [](x, x) over the invertible x in a different way, expressing 
it in powers of —1 alone. Using the Chinese remainder theorem, we view 
it as a product of pairs (a,b), with a and b varying independently over 
suitable ranges. 

The x in the range 1 <x < bgt include exactly one number from 
each pair {x, —x} (mod pq) with 1 <x < pq—1. Hence the corresponding 


remainder pairs, 


(a,b) = (x mod p,x mod gq), 


include exactly one of each pair {(a,b), (—a, —b) }. We get exactly one of 
each (mod p, mod gq) by taking the ranges |< a< p—landI<b< a 
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This makes the sign uncertain, but at any rate 


invertible x q- 


(,x) =£((p— 1)! ((q—1)/2)!”"') (mod p, mod 4). 


I<x< fet 


(2) 


since each value of a, 1 <a < p—1, occurs in (g—1)/2 pairs, and each 
value of b, 1 <b < (q—1)/2, occurs in p—1 pairs. Thanks to Wilson’s 
theorem, we can express the powers of factorials in (2) as powers of — 1. 
Since (p— 1)! = —1 (mod p), the first component = (—1)> (mod p). 
To find ((g—1)/2)! (mod g) we shape —1 = (g—1)! as in Section 6.5: 
—1=(q-1)! (modgq) 
=1x2x---x ((¢—1)/2) 
x (=(q=1)/2) xx (=2) x(=1) (mod g) 
g-1 a 
= ((q—1)/2)?(-1) 7 (mod g). 


Therefore ; 
q — | 


((q—1)/2)" =(-1)(-1)? (mod q). 
Raising both sides to the power Be , we get the second component of (2), 
((q—1)/2)!”" = (-1) 


Thus the expression (2) for [][(x,x) (mod p, mod q), reduces to 


p-l. p—l q-l 


7(-1) 2 2 (mod q). 


invertible x 


(x,x) =£((-1)7,(-1)* (-1)7 7 ) (mod p, mod g) 


1<x< 2 


a 


(3) 


Equating (3) with (1) in the previous subsection we get either 


\n)~ me (2) =n 


or 


In either case, 
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Exercises 


With the help of quadratic reciprocity we can find, for any fixed odd prime p, 
the primes g for which p is a square mod g. They fall into a finite number of 
arithmetic progressions (as we have already seen they do for p = —I and p = 2). 
Here is what happens for p = 3. 


9.8.1 Explain why every odd prime g is of one of the forms 12n+1, 12n-+5, 
12n+7, or 12n+ 11. 


9.8.2 Use quadratic reciprocity with remaindering, as in Section 9.5, to show that 
3 is a square mod g for precisely the odd primes g of the form 12n + 1 and 
12n+11. 


By multiplying the values of (3) found in Exercise 9.8.2 by the correspond- 

ing values of (+) we can also obtain the values of (=) 
9.8.3 Show that —1 is a square mod g for the odd primes g of the form 12n+ 1 
and 12n-+5, and a nonsquare for those of the form 12n+7 and 12n+ 11. 


Deduce that —3 is a square mod g for the odd primes g of the form 12n+ 1 
and 12n+7. 


Similarly, we can find the values of (5) and (=) 


9.8.4 Show that the odd primes g for which 5 is a square mod g are precisely 
those of the form 20n+ 1, 20n+9, 20n +11, and 20n + 19. Test this result 
by showing that 5 is a square mod 41, 29, I1, and 19. 


9.8.5 Show that the odd primes g for which —5 is a square mod g are precisely 
those of the form 20n + 1, 20n+ 3, 20n+7, and 20n+9. Test this result by 
showing that —5 is a square mod 41, 23, 7, and 29. 


9.9 


Discussion 


As we have suggested over the last few chapters, quadratic reciprocity 
emerged from the study of primes represented by quadratic forms such 
as x* + y*, x*-+2y, and x7 +3y*. Fermat was the first to raise and answer 
such questions, but his methods are not known. As far as we know, Euler 
was the first to recognize the role of quadratic reciprocity and to prove it in 
special cases. 

The first to attempt a general proof was Legendre (1785). However, 
his proof depended on the unproved assumption that any arithmetic pro- 
gression an-+ b with gcd(a,b) = 1 contains infinitely many primes. This 
assumption is easily proved in certain cases (such as the cases 4n-+ 1 and 


9.9 Discussion 179 


4n +3 mentioned in the exercises to Section 6.3 and in Section 6.7) but the 
general theorem is harder to prove than reciprocity itself. The first proof 
was given by Dirichlet (1837), and the deep analytic methods he devised to 
prove it are still the standard approach to primes in arithmetic progressions. 

Gauss found the first proof of quadratic reciprocity on April 18, 1796, 
when he was not quite 19. It is a long and ugly proof, and by the time he 
published it in his Disquisitiones Arithmeticae of (1801) he had found two 
more proofs; one using quadratic forms and the other using roots of unity. 
Quadratic reciprocity was Gauss’s favorite theorem and altogether he gave 
eight proofs of it. Since then, many other mathematicians have published 
proofs—some of them variations or simplifications of Gauss, and others 
introducing new ideas. 

Like Pythagoras’ theorem in geometry, quadratic reciprocity is a core 
theorem in number theory, bound to arise no matter how one approaches 
quadratic Diophantine equations. This is why the theorem has so many 
proofs: all roads lead to it, and each road shows it from a different angle. 
A comprehensive history of quadratic reciprocity, including a table and 
classification of 196 (1) proofs given up to the year 2000, may be found 
in Lemmermeyer (2000). Another book of interest is Pieper (1978), which 
discusses 14 different proofs in detail. 

The law of quadratic reciprocity generalizes to cubic, biquadratic, and 
higher power reciprocity laws. Just as the quadratic character has values 
+-1 (the square roots of 1), there is a cubic character with values 1, C3, 
‘es (the cube roots of 1), and a biquadratic character with values +1, +i 
(the fourth roots of 1). These generalizations were not made because math- 
ematicians had run out of things to say about quadratic forms—quite the 
contrary; quadratic forms themselves demand that cubes and fourth powers 
be considered. This was discovered by Euler, who noticed the following 
results (later proved by Gauss): 


p isa prime x° + 27y* & p = 3n+1 and 2 is a cube mod p 
pis aprime x* + 64y* = p= 4n+ 1 and 2 is a fourth power mod p 


The cubic reciprocity law was found by Eisenstein (1844) and it re- 
quires investigation of Z[¢,|, which is why we call Z[¢,| the Eisenstein 
integers. (Cubic reciprocity was already known to Gauss, but he did not 
publish his results.) Likewise, biquadratic reciprocity was discovered by 
Gauss and it requires investigation of Zi]. This in fact was the purpose of 
Gauss (1832), where the basic properties of Z/i] were first published. 
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An nth power reciprocity law similarly involves the cyclotomic inte- 
gers Z|C,,|, with all their attendant difficulties such as the failure of unique 
prime factorization discovered by Kummer (1844). In the case of nth power 
reciprocity (and unlike the case of Fermat’s last theorem), Kummer over- 
came these difficulties completely with his theory of ideal numbers, and 
published an nth power reciprocity law in 1850. Eisenstein, also using 
Kummer’s theory of ideal numbers, published a different version of nth 
power reciprocity in the same year. A modern proof of Eisenstein’s reci- 
procity law may be found in Ireland and Rosen (1982), pp. 215-218, and 
the history of all reciprocity laws up to 1850 may be found in Lemmer- 
meyer (2000). 


PREVIEW 


This chapter unites many of the algebraic structures encountered in 
this book——the integers, the integers mod n, and the various exten- 
sions of the integer concept by Gauss, Eisenstein and Hurwitz—in 
the single abstract concept of ring. 

We begin with the general ring concept, specified by certain ax- 
ioms for + and x, and observe how these axioms suffice to cap- 
ture general concepts of divisibility, primes, and units. The concept 
of field—a ring in which all nonzero elements are units—is briefly 
discussed, and the main examples Q, R, C, and Z/pZ are reviewed. 
We then specialize to rings of algebraic integers, and particularly 
quadratic integers. We define algebraic numbers and algebraic in- 
tegers and use Dedekind’s linear algebra approach to show that the 
algebraic integers are closed under +, —, and x, and hence form a 
ring. 

The special case of quadratic integer rings, and the quadratic fields 
Q(Vd) that contain them, is examined in more detail. We give a 
general explanation of the phenomenon of quadratic integers, such 
as liv 3 | that “look fractional”, by determining the integers of all 
the fields Q(V/d) for d € Z. 


The concept of norm, previously seen in special cases, is given a 
uniform definition over all quadratic fields. Finally, we specialize 
further to the imaginary quadratic fields—the Q//d] for negative in- 
tegers d. These enjoy somewhat simpler properties than the Q(/d) 
for positive d. For example, the integers of an imaginary quadratic 
field include only finitely many units (at most six). 
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10.1 The ring axioms 


The integers Z, with their operations of + and x, are the first ring studied 
in mathematics, and some basic properties of Z are taken as the defining 
properties (axioms) of rings in general. We briefly mentioned them in Sec- 
tion 1.3, in connection with abelian groups. An explicit list of the axioms 
is as follows. For all integers a, b and c we have: 


a+(b+c)=(a+b)+c (associative law) 
at+b=b+a (commutative law) 
a+(—a)=0 (additive inverse property) 
a+O=a (identity property of 0) 


There is a similar set of rules describing the behavior of x. 


ax(bxc)=(axb)xc (associative law) 
axb=bxa (commutative law) 
axl=a (identity property of 1) 
axO0=0 (property of 0) 


and finally, there is a rule for the interaction of + and x: 
ax (b+c)=axb+axc (distributive law) 


Strictly speaking, these are the defining properties of a commutative 
ring. We have also dealt occasionally with noncommutative rings, such as 
the quaternions IH, which satisfy all the above axioms except the commu- 
tative law for x. The quaternions, however, are close to number rings in 
having a multiplicative norm, and noncommutative ring theory in general 
has a rather different flavor. 

It is fair to say that most of ring theory deals with objects that behave 
like integers, and the example of Z helps us to anticipate which concepts 
will be relevant and helpful in dealing with unfamiliar, but “integer-like”’ 
objects. Indeed, we have already used the word “integer” for extensions of 
Z such as Z|i| (the Gaussian integers), Z|¢,| (the Eisenstein integers), and 
Zi 1+i+j+k 


=, i,j,k] (the Hurwitz integers). 


10.1 The ring axioms 183 


Divisibility and primes 


In the examples of generalized “integers” we have seen the importance of 
the ordinary integer concepts of divisibility and primes. In any ring, we 
say that b divides a if there is a c such that 


a= be. 


Other ways to say this are that ais divisible by b, that bis a divisor of a, or 
that ais a multiple of b. Divisibility is an interesting concept in Z because 
an arbitrary integer a is generally not divisible by another integer b. In fact, 
it can be hard to decide whether a has any divisors at all, apart the obvious 
ones +1 and +a. An ordinary integer with no divisors except the obvious 
ones is called prime. In an arbitrary ring R the concept of prime is the 
same, except that the place of +1 is taken by the units of R: the elements 
of R that divide | (or, equivalently, the invertible elements of Rk). Thus we 
call a € R prime if ais divisible only by units and units times a (the latter 
are called associates of a). 

Even in Z the primes form no clear pattern, and of course prime num- 
bers figure in many of the classic unsolved problems about Z. Thus the 
development of ring theory has been heavily influenced by the problem of 
understanding primes. The best understood rings, such as Z[i], tend to be 
those whose primes behave like the primes in Z. In some rings where this 
is not the case—specifically, where unique prime factorization fails—it has 
been found worthwhile to create “ideal” primes that behave better than the 
actual primes. We take up the story of these “ideal” primes in Chapters 11 
and 12. 


Exercises 


Recall from Chapter 8 that the quaternions H are certain 2 x 2 matrices. In fact, 
the (noncommutative) ring concept extends much further in this direction, to rings 
of n X n matrices for any fixed natural number n. For the moment we consider 
matrix |. 

10.1.1 Check that the axioms for + hold. 


10.1.2 Check that the axioms for x hold, except the commutative law (when 
n > 2). You need not use explicit multiplication to prove the associative 
law. Why? 


10.1.3 Check that the distributive law holds. 
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10.2 


Rings and fields 


The sets Q (rational numbers), IK (real numbers), and C (complex num- 
bers) are also rings because they obviously have the properties of +, —, 
and x listed in the previous section. This is no surprise because Q, R, 
and C were always intended to extend the concept of integer, retaining all 
the ring properties and adding more. One thing that Q, R, and C have 
and Z does not is a multiplicative inverse a~! for each nonzero element, 
with the characteristic property that aa~! = 1. A commutative ring with a 
multiplicative inverse for each nonzero element 1s called a field. 

Fields are not really typical rings, and their theory has quite a different 
flavor from ring theory. In particular, divisibility is not an interesting notion 
in a field because a nonzero element b divides any element a (with quotient 
a/b = ab~'), Likewise, the concept of unit is not interesting because all 
nonzero elements of the field are units. Nevertheless, fields play an impor- 
tant role in ring theory. Many of the rings we used in earlier chapters, such 
as Z/i|, Z|,/—2], and Z[¢,], are embedded in the complex field C. Since C 
has all the ring properties (associative laws, commutative laws, and so on) 
the same will be true of any subset R C C for which the expressions a-+ b, 
—a, aX b mentioned in the ring axioms make sense. That is, a set R C Cis 
a ring provided R is closed under +, —, and x. This means that ifa,beR 
thena+b,a—b,axbeR. 

The rings just mentioned exemplify the process of “closing” a set under 
the operations +, —, and x. In these examples, we take a number a not in 
Z and close the set ZU {a} by forming all possible sums, differences and 
products involving a and the integers. The result is called the ring Zla]. 

If we then take an element ) not in Zla] and repeat the process of clos- 
ing under +, —, and x, the resulting ring is called Zla, b], and so on. This is 
in accordance with the notations we have already used for the rings Z[i, j, k] 
and Z(+ ar Ete i,j,k] of quaternion integers. Any subset of the quaternions 
closed under +,-—-,and x is aring, though generally noncommutative. (In- 
deed, it is clear that any subring of Hl containing nonzero multiples of more 
than one of 1, J, K is noncommutative.) 

When the ring is a subset of C, we can form its closure under divi- 
sion by nonzero elements, and obtain a field. If we take an element a not 
in such a field F’, and again close under +, —, x, and + (by nonzero el- 
ements), then the result is called F(a). We occasionally use this round 
bracket notation for fields such as Q(\/2), though in fact here we have 


Ql v2] = Q(v2). 
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Finite rings and fields 


Finite rings came up implicitly in Section 3.2, where we introduced the 
notation Z./nZ for the set of congruence classes mod n under the operations 
of + and x for congruence classes. We did not comment on it at the time, 
but it is easy to see that Z/nZ is a ring. Its ring properties are “inherited” 
from its parent ring Z. For example, the + operation on congruence classes 
is commutative because 


(nZ+a)+(nZ+b)=nZ+ (a+b) _ by definition of + in Z/nZ 
=nZ+(b+a) by commutative law for + in Z 
= (nZ+b)+(nZ+a)_ by definition of + in Z/nZ 


We also showed, in Section 3.3, that every nonzero element of Z/pZ 
has a multiplicative inverse when p is prime. Thus the finite ring Z/pZ is 
a field. 

The units of the ring Z/nZ are particularly interesting, since they form 
the group (Z/nZ)*, which can be quite complicated. It is easy to see (exer- 
cises) that the units of any ring form a group. If the ring is noncommutative 
then the group of units may be noncommutative, as we saw in the case of 
the Hurwitz integers in the exercises to Section 8.5. We have also seen that 
infinite groups of units occur, for example in Z\V/2). This was implicit in 
Section 5.4. 


Exercises 
10.2.1 Show that the product of two units in a ring R is also a unit of R. 


10.2.2 Show also that the multiplicative inverse of a unit is a unit, and hence that 
the units of any ring form a group. 


The ring Z/nZ, for n not prime, differs from Z in having zero divisors— 
nonzero elements whose product is zero. 


10.2.3. Give an example of a zero divisor in Z/4Z. 
10.2.4 Explain why Z/nZ has zero divisors for any n that is not prime. 


Zero divisors prevent us from extending Z/nZ to a field by adjoining “fractions”, 
the way we extend Z to Q for example. 


10.2.5 If ais a zero divisor in Z/nZ, show that we cannot consistently adjoin an 
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10.3. Algebraic integers 


Algebraic numbers 


Many concepts of ring theory originated in Dedekind’s theory of algebraic 
integers. Dedekind generalized the idea of embedding the ordinary integers 
in the field of rational numbers by embedding various rings of algebraic 
integers in fields of algebraic numbers. 


Definition. A number @ € Cis algebraic if 
Ay Ol” +a,_,o"—! +++-+a,%+d),=0 where dy,d,,...,4m € Z, 
and it is of degree mif it satisfies no such equation of lower degree. 
Examples are: 
® rational numbers, which are the algebraic numbers of degree 1, 


@ /2, which is of degree 2 because it satisfies x? —2 = 0 but no equa- 
tion of lower degree (since V2 is irrational). 


Like the rationals, the set of all algebraic numbers is a field, though this 
is not obvious. It is not even clear that the algebraic numbers are closed 
under +, for example, 

V2 satisfies x? — 2 = 0, and hence is algebraic, 

V3 satisfies x — 3 = 0, and hence is algebraic, 

but what equation does W/2+ W/3 satisfy? 
We do not prove that the algebraic numbers form a field here, but we can 
show that /2+ 3 satisfies a polynomial equation with integer coeffi- 


cients. This follows from what we are about to prove about algebraic inte- 
gers, whose definition we recall from Section 7.4. 


Definition. A number @ € C is an algebraic integer if it satisfies a monic 
polynomial equation with integer coefficients, that is 


m m—1 _ 
OO +4, ,°  +++++a,A+a,=0 where dp,a,,...,a,_,€Z. 
Examples are: 


® ordinary integers, which are algebraic integers of degree 1, 
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e \/2 and \/3, because they satisfy the monic equations x° — 2 = 0 and 
x” —3 = 0 respectively, 


e (—1+/—3)/2, because it satisfies x* +x+1=0. 


On the other hand, the only rational algebraic integers are the ordinary 
integers, as we proved in Section 7.4. 


The ring of algebraic integers 


The algebraic integers lie in the ring C. Hence to prove that they form a 
ring it suffices to prove that they are closed under +, —, and x. This was 
first proved by Eisenstein, but we follow a more modern proof given by 
Dedekind. 


Closure properties of algebraic integers. /f a and B are algebraic inte- 
gers then so are a+ B, a— B, and a. 


Proof. By hypothesis, a and B satisfy equations 


a"ta, jo! 4..-+aat+a,=0 where dp,d,,...,4,,_, € Z, 
B'+b, ,B"l+---+b,B+b,=0 where by,b,,...,b,_, €Z. 


These equations show: 


® 0" = —a)—a,A—-+-— a,,_ja| is a linear combination of 1, a, 
., a”! with integer coefficients. 


e gril — Ay — a, or —...7a,_,0" is a linear combination of 
l,a,...,a@”~! with integer coefficients (since a” can be rewritten 
in terms of 1,a@,..., a7). 

e Similarly, every power of @ is a linear combination of 1,a,...,a”"~! 
with integer coefficients. 

e Likewise, every power of B is a linear combination of 1,B,...,B”~! 


with integer coefficients. 


e Therefore, every polynomial in a and f is a linear combination of 
terms oa B/ withO <i<m—landO<j<n-1. 
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Thus if we denote the mn products a B/ by ,,@>,.--, Onn, WE Can 
write any polynomial @ in a, B with integer coefficients as a linear com- 
bination of @,,@,,...,@mnn with integer coefficients. In particular, if @ is 
any one of 7+ 8B, a— B, or ~B we have 


O=k,O, +++ +Knn®@m forsome k,,k,,...,kmn € Z. (*) 


From this we obtain mn equations in the mn “unknowns” @,, @,..-,@Qnn 
by multiplying (*) by @,, @5,..., @mn and rewriting each right-hand side as 
a linear combination of @,, @,,..., @nn with integer coefficients kv’ ) 


if / 

Ww, — ky O, oe see + Kon Onn 
ff i 

OW, = ky OM, + +++ + Kp, Omn 


OOnn = xen) O, 4... kon) Onn 


These are homogeneous equations in @,, @),..., yn, With a nonzero solu- 
tion, and hence their determinant must be zero. That is, 


(ki-o kl... in 
ee 
| = 0. 
am ; 
Kenn) lam) km) — 


This determinant is a polynomial in @, with coefficients in Z, and with 
coefficient of w’”” equal to +1. Hence @ = a+ B,a— B, or of is an 
algebraic integer. O 


Exercises 


The units among the algebraic integers divide 1, by definition, and hence they are 
the algebraic integers o such that a! is also an algebraic integer. 
10.3.1 Deduce that & is an algebraic integer unit if and only if a satisfies a monic 


polynomial equation with integer coefficients and constant term -£1. 


10.3.2 Verify that such polynomials are satisfied by the units +¢, and +62 of 
Z\ G3]. 
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10.4 


Quadratic fields and their integers 


The ring of all algebraic integers has some inconvenient properties. For 
example, the square root of any algebraic integer a is also an algebraic 
integer, and hence @ has the factorization @ = /a./a@. This shows that 
there are no “primes” in the ring of algebraic integers. For this reason, one 
usually works in a ring of algebraic integers of bounded degree, such as the 
rings Zi] and Z[,/—2] we used previously. We now generalize the latter 
examples to the ring of integers of a quadratic field. 

Each quadratic field can be written Q(/d), where d € Z and Q(Vd) is 
“the smallest field containing Q and Jd” or, in other words, the result of 
closing the set QU {Vd} under the operations +, —, x, and + (by nonzero 
members). It is closure under --, as well as +, —, and x, that produces a 
field, and we use the round bracket notation to distinguish the result from 
closure under +, —, and x alone, which produces a ring but not necessarily 
a field. (For example, Z/i] is the closure of ZU {i} under +, —, and x, and 
itis not a field.) 


The round bracket notation is superfluous in this case, because in fact 


Q(vd) = Q[vd]. 
Characterization of Q(/d). Q(Vd) = Q[Vd] = {a+ bVd:a,b€ Q}. 


Proof. Each number a+ bd with a,b € Q certainly results from Vd and 
the members a, b of Q by + and x. Hence 


{at+bvd:a,b € Q} € Q[Vd] C Q(Vva). 


Conversely, we can show that {a+bV/d: a,b € Q} is closed under +, 
—, x,and hence {a+ bV/d:a,b € Q} D Q[Vd]. We also show it is closed 
under ~-, hence {a+bV/d:a,b € Q} D Q(Vd). 

The set {a+ b/d: a,b € Q} is obviously closed under + and —. It is 
closed under x because 


(a, +b, Vd) (ay +b, Vd) = aya, +b, byd + (ayby +ayb,)Vd. 
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And it is closed under ~ (by nonzero members) because 


1 _ a—b/d 
at+b/d (a+bVd)\(a—bvVd) 
a-bv/d 

~ 2 — bd 


a b 
— VY, Oo 
2 pa Ba! 


The integers of Q(V/d) include +\/d (since +d satisfy the monic 
equation x” —d = 0) and hence they include all members of Z[/d] (by clo- 
sure of the algebraic integers under +). But sometimes they include more, 
which is why we have to use the awkward phrase “integers of Q(./d)” in- 
stead of Z[\/d]. For example, Q(./—3) includes (—1 + \/—3)/2, and the 
latter is an algebraic integer because it satisfies x° + x+1=0. 

The precise situation is described in the following theorem. Before we 
state the theorem and begin its proof we note that the integer d in Q(vd ) 
can be assumed sguarefree, that is, not divisible by any square > 1. This 
is so because if d = n’c for some n,c € Z, then Q( Vd) = Q(nv/c), which 
equals Q(,/c) by closure under x and ~. The other thing to remember is 
that any square is = 0 or | (mod 4). 


Integers of Q(\V/d). When d # 1 (mod 4) the integers of Q(V/d) are the 
a+bvd with a,b € Z. When d = 1 (mod 4) the integers of Q(/d) are the 
a+b/d witha,b € Z ora+1/2,b+1/2€Z. 


Proof. If a+bVd € Q(vd ) is an algebraic integer, and hence a solution 
of some equation x” + Ax + B = 0 with A,B € Z, then it follows from the 
quadratic formula that the other solution is a— b/d. Hence 


+Ax+B= (x- (a+ bvd)) (x- (a-— bva)) . 
Equating coefficients we get 
A=—2a, B=a’—db’. 


which shows that 2a and a* — db’ are ordinary integers. 
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In particular, a € Z ora+1/2 € Z. In the first case (2a even), 


acL=ae EZ 
=db’ eZ since a*—db’ €Z 
=b’€Z_ since no integer square n? > 1 divides d, 
and therefore b* 4 m* /n* with n* > 1, 
=>beEZ. 


The case a+ 1/2 € Zis when 2a is odd, so (2a)* = 1 (mod 4), and then 


a’ —db* € Z = (2a)* — d(2b)* = 0 (mod 4) 
=> d(2b)* = (2a)* = 1 (mod 4) 
=> d = 1 (mod 4) and (2b)” = 1 (mod 4) 
since (2b)* = 3 (mod 4) is impossible 
=> d = | (mod 4) and 2b = 1 (mod 2) 
=> d= 1 (mod 4) and b+ 1/2 € Z. 


Finally, to see that every number a+ bd with a+ 1/2,b+1/2 € Zis an 
integer of Q(V/d) when d = 4m +1, it suffices to check the coefficients of 
the equation it satisfies: x7 — 2ax+ (a* —db*) = 0. They are easily shown 
to be integers. 0 


Exercises 


10.4.1 Show that each element of Q(/d) is an integer of Q(./d) divided by an 
ordinary integer. 


The second solution, @’, of the monic quadratic equation with integer coefficients 
satisfied by a quadratic integer & is called the conjugate of a. This generalizes 
the notion of “complex conjugate” in Zi) and the notion of “conjugate surd” from 
high school algebra. 


10.4.2 If & is rational, what is a’? 


10.4.3 Verify that conjugation is a ring automorphism of the integers of Q//d], 
that is 


e The map a@ +> a’ is 1-to-1 and onto. 


e (a+ B)'=a' +B’ and (aB)’ = a’ B’. 
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10.5 Norm and units of quadratic fields 
The norm on Q(\/d) is the function defined by 
norm(a+bVd) = a’ — db’. 


It follows that an integer a+bVd of Q(/d ) has (ordinary) integer norm, 
because the proof of the last theorem of the previous section showed that 
a’ — db? is an ordinary integer in that case. This norm includes the norms 
previously defined for d = —1, d = —2, and d = n, and like them it is 
multiplicative: 


norm(x,x,) =norm(x, )norm(x,) for any x,,x, € Q(Vd). 


This can be checked by setting x, =a,+0, Vd, Xy =A, + b,J/d and work- 
ing out both sides. One finds the identity 


(a, +db,b,)° — d(a,b) + ab, )* = (aj — db{) (ay — db5), 


which is Brahmagupta’s identity of Section 5.4 when d > 0 and Diophan- 
tus’ identity of Section 1.8 when d = —1. These properties of the norm 
imply that, if x, divides x, in the integers of Q(Vd), then norm(x, ) divides 
norm(Xx,) in the ordinary integers. 

As in any ring, the units among the integers of Q(V/d) are the elements 
that divide 1. It follows, by the previous remark, that the units of Q(/d) 
are integers with norm +1. Conversely, integers of norm -£1 are units 
because if a+ b/d is an integer (with a,b € Z or a+ 1/2,b4+1/2 € Z) 
with norm +1 then 


+1 =a*—db* = (a—bvd)(a+bva), 


which shows that a+ bvd divides 1. 

When d > | there are infinitely many units among the integers of 
Q(/d), corresponding to the infinitely many solutions of the Pell equa- 
tion x* — dy* = 1. For example, the solutions of x* —2y” = 1 are pairs 
(x,y) for which x+ yvV2 is a unit of Q(/2). We found these solutions in 
Section 5.2, giving the units +(3 +22)" forn € Z. 

On the other hand, if d < 0 then there are only finitely many integers 
or half integers a, b with a* — db* = 1, and hence only finitely many units. 
In particular: 
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e the units of Z[i| are +1, +i, 
e the units of Z|./—2] are +1, 
e the units of Z[C,], where 6, = (—1+ V—3)/2, are 1, +06,, 463. 


By the theorem in the previous section, Z|¢,| is the ring of integers of the 
field Q(,/—3), and in fact it has the most units of any ring of integers of 
Q(Vd) with d <0. 

The fields Q(vd ) with d < 0 are called imaginary quadratic fields and 
their basic theory 1s particularly elegant. One advantage they have over real 
quadratic fields is that when d <0, norm(a+bvd) = a? — db? is simply the 
square of the distance of a+b\d from 0 in the complex plane. This makes 
certain properties geometrically obvious, such as the division property for 
Zi} found in Section 6.4. Another example is the following theorem. 


Units of imaginary quadratic fields. The only units among the integers 
of imaginary quadratic fields are +1, +i, +, and +. 


Proof. Since units have norm 1, units of an imaginary quadratic field 
Q(V/d) lie at distance 1 from 0 in the complex plane. But we also know 
that the integers of Q(V/d) are of the form a+ bVd where a,b € Z or 
a+1/2,b+1/2€ Z. If |d| > 5, all such integers except 0, +1 are at dis- 
tance > 1 from 0. Thus the only units apart from +1 are those listed above, 
occurring in Q(i) and Q(./—3). CO 

The imaginary quadratic fields are better understood than the real ones. 
For example, Gauss (1801) found that the integers of Q(Vd) have unique 
prime factorization for d = —1, —2, —3, —7, —11, —19, —43, —67, —163, 
and in 1967 Baker and Stark showed that these are the only imaginary 
quadratic fields with unique prime factorization. The real quadratic fields 
with unique prime factorization are still not known, nor is it known whether 
there are infinitely many of them. 


Exercises 


An equivalent way to define the norm, which generalizes to arbitrary algebraic 
number fields of finite degree, is in terms of conjugates. 


10.5.1 Show that norm(@) = aa’. 
10.5.2 If @ is rational, what is norm(a)? 


10.5.3. Hence deduce the multiplicative property of the norm from the multiplica- 
tive property of conjugation. 
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10.6 


Discussion 


The rings Z, Q, IR, and C were known before the ring concept had a name. 
Rings began to proliferate in the mid-19th century when Kummer studied 
the cyclotomic rings Z[¢,| and Dedekind sought a general theory of alge- 
braic integers. Dedekind’s first account of his theory appeared as a supple- 
ment to Dirichlet’s Vorlesungen uber Zahlentheorie (lectures on number 
theory) in 1871. At this time all the examples of interest to Dedekind were 
rings of algebraic numbers, so a ring (still unnamed) in 1871 could be de- 
fined as a subset of C closed under +, —, and x. 

The need for a definition by axioms, rather than closure properties, was 
gradually felt as other sets with + and x operations came to be studied 
intensively. The congruence class rings Z/nZ (defined in essentially the 
modern way by Dedekind (1857), with congruence classes as the objects 
being added and multiplied) were one class of examples. Another was 
the class of matrix rings, which were shown to include the quaternions by 
Cayley (1858) and many other structures by Peirce (1881). 

Matrix rings, while generally noncommutative, also include many in- 
teresting commutative rings. We saw how C can be represented by 2 x 2 
real matrices in Section 8.1. It is also possible to represent Z/a|, for any 
algebraic integer a of degree n, by n x n integer matrices, and Q(a) by 
n X n rational matrices. Briefly, the idea is this. 

If the monic equation satisfied by o& is 


a" +a, 0! ++.-+a,a+d) =0 with dp,...,4,_,€Z 


then a” = —a,_ ar! —+++—ad,Q— dy and hence all powers a", ott 
are rational linear combinations of 1,a@,a7,...,a”~'. This allows Q(a) 
to be viewed as a vector space over Q with basis {1, 0, a7 oth, 


Multiplication by @ induces a linear map of this vector space with matrix 


because right multiplication of the row vector (1 @ a? --- a? a!) by 
Mz, yields its multiple by a: 


(ao? of +» -—a_ a —+++—a,—dy). 
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It follows that matrix polynomials in M,, with rational coefficients, be- 
have the same as the corresponding polynomials in @, that is, the same as 
elements of Q(a@). 

All of these examples were finally unified under the abstract ring con- 
cept, defined axiomatically by Fraenkel (1914) and developed by Emmy 
Noether and her students in the 1920s. Noether always said “Es steht schon 
bei Dedekind” (“It’s already in Dedekind’), and she urged her students to 
read all of Dedekind’s works on algebraic number theory. These included 
three different versions of his last supplement to Dirichlet’s Vorlesungen, in 
1871, 1879, and 1894, and a separately published work that is now avail- 
able in English translation, Dedekind (1877). The latter is probably the 
easiest introduction to Dedekind’s work for the modern reader. 

Among other things, Dedekind (1877) generalizes the concept of norm 
to an arbitrary algebraic number field Q(a). He defines the conjugates 
o’,a",... of a to be the other solutions of the minimal degree monic equa- 
tion satisfied by a and defines norm(a) as the product ao’a”---. It can 
then be shown that norm() is an ordinary integer when © is an integer of 
Q(a), and that norm(a@f) = norm(a)norm(fB). The proofs are not hard 
but the latter requires concepts from the companion volume to this one, 
Elements of Algebra (field isomorphisms). It can be said that algebraic 
number theory is where the concepts from the theory of algebraic equa- 
tions (groups and fields) begin to interact with concepts from the theory of 
Diophantine equations (rings). 


PREVIEW 


This chapter pursues the idea that a number is known by the set of 
its multiples, so an “ideal number” is known by a set that behaves 
like a set of multiples. Such a set / in a ring R is called an ideal, and 
it is defined by closure under sums (a,b € / => a+b € J) and under 
multiplication by all elements of the rng (a € /, re R=arel). 


The set (a) = {ar:r © R} of all multiples of any a € R is an ideal, 
called the principal ideal generated by a. Any nonprincipal ideal / 
is therefore not the set of multiples of any actual member of R—it 
represents an “ideal member” of R. 


In Z, every ideal is principal, and the properties of ideals reflect 
known properties of integers. In particular: 
a divides b <> (a) contains (b) 


p is prime <= (p) is maximal. 


In rings where unique prime factorization fails, such as Z|,/—5], 
nonprincipal ideals exist. We find such an ideal as the “gcd ideal” 
of 2 and 1+ ./—5, {2m+(1+/—5)n:m,n € Z}, and confirm that 
this ideal is nonprincipal by looking at its shape in the plane. 

The ideal {2m+(1+./—5)n: m,n € Z} “divides” the principal ideal 
(2) because it contains it, and it is “prime” because it is maximal. 
But if an ideal / “divides” an ideal J there should be a “product” /K 
of ideals / and K equal to J. 


We hope that (2) splits into such a product, because this may rectify 
the failure of unique prime factorization in Z|,\/—5| exhibited by 
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Thus the final step is to define the product of ideals. It then turns 
out, as hoped, that the two distinct products equal to 6 in Z[/—5), 
2x3 and (1+./—5)(1—+/—S), split into the same product of prime 
ideals. 


11.1 Ideals and the gcd 


In Section 7.4 we had our first brush with failure of unique prime factoriza- 
tion when we found that 4 has two distinct prime factorizations in Z| ./—3]: 


4=2x2=(1+V-3)(1—V-3). 


This problem was fixed by enlarging Z|./—3] to Z[—Hy3), where the 
factorizations 2 x 2 and (1 + /—3)(1 — V/—3) are actually the same, up 
to unit factors. This is because Z/— Lev 3) contains the units Y=" and 


iny=3 whose product is 1, and therefore 


2x2=2(*)2( x) = (1+ V—3)(1 — V—3). 


2 


However, this is in some sense a lucky escape, and a more serious 
problem occurs in Z|./—5], where 6 has two different factorizations: 


6=2x3=(1+V—5)(1—-V—5). 


Using the norm a’ + 5b* of any a+ b\/—5 € Z//—5], it can be checked 
that none of these factors are products of elements of smaller norm. Nor 
are the units +1 of Z[,\/—5] able to account for the difference between the 
factorizations. Thus 6 has two distinct prime factorizations in Z[,/—5|. 
And we cannot get around this problem by a simple enlargement of the 
ring, like that of Z[/—3] to Z{ty=3), because Z[,/—5] already contains 
all the integers of Q(./—5). 

In such situations, Kummer and Dedekind were able to restore unique 
prime factorization by extending the concepts of product and divisibility 
to what Kummer called “ideal numbers” and what Dedekind called ideals. 
We find our first “ideal number” by searching for the gcd of 2 and 1+./—5 
in Z|\/—5], based on a new approach to the gcd in Z. 
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The ged in Z revisited 


The basic idea of Kummer and Dedekind 1s that a number is known by the 
set of its multiples. We illustrate this idea, and its application to the gcd, 
by the example of gcd(4,6). Figure 11.1 shows the set of multiples of 4, 
which we denote by (4), as black dots among the integers: 


Figure 11.1: The multiples of 4 


Similarly, Figure 11.2 shows the set (6) of multiples of 6: 


_6 0 6 
e O © oO O O @ 0 oO .@) oO oO @ 


Figure 11.2: The multiples of 6 


Finally, Figure 11.3 shows all sums of members of (4) and members of (6). 
We denote the set of these sums by (4) + (6): 


Figure 11.3: The sums of multiples of 4 and multiples of 6 


It is clear that (4) + (6) = (2) and that 2 = gcd(4,6) so the multiples 
of gcd(4,6) are obtained by adding all multiples of 4 to all multiples of 6. 
More generally, we let (k) denote the set {kn:n € Z} of multiples of k for 
any k © Z. Then we have: the set (a) + (b) = {am+bn: m,n € Z} equals 
the set (gcd(a,b)) of multiples of gcd(a, b). 

We prove this theorem in the next section. It gives an abstract alterna- 
tive to the Euclidean algorithm for finding the gcd, with the advantage of 
being applicable to any ring R. This is done by replacing the sets (k) above 
by the following more general concept: 


Definition. An ideal in a ring R is a subset / of R such that 
eaclandbel=a+bel, 


@®aclandreR=arel. 
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In other words, / is closed under addition and closed under multiplication 
by elements of A. It follows that / is also closed under subtraction, because 
bel= —b €]I (multiplication by —1 € R) anda—b=a+(—b). 

It is often the case that if / and J are ideals, then so is 


f+J={i+j:ielandjeJ}. 


The latter is what we call gcd(/,/), and we use it in Section 11.5 to find 
the gcd of 2 and 1+ ./—5 in Z|,\/—5]. But first we investigate the concept 
of ideal in the more familiar ring Z. 


Exercises 


The theorem that (a) + (b) = (gcd(a,b)) in Z can be proved directly, and it is 
worth doing so in advance of the proof using ideal theory in the next section. 


11.1.1 Show that all members of (a) + (b) = {am+bn:m,n € Z} are multiples 
of gcd(a,b). 


11.1.2 Show that {am-+bn:m,n © Z} is closed under difference, and deduce that 
all its members are multiples of the smallest positive member. 


11.1.3 Deduce that c = gcd(a,b), and hence that (a) + (b) = (ged(a,b)). 


11.2 Ideals and divisibility in Z 


The simplest examples of ideals occur in Z. For instance (2) = {2n:ne€ Z} 
is an ideal, as is (6) = {6n:n € Z}. In fact, for any a € Z, the set 


(a) ={an: ne Z} 


is an ideal called the principal ideal generated by a. Divisibility in Z cor- 
responds to containment of principal ideals. For example 


2 divides 6, (2) contains (6), 
and in general 
adividesb <= (a) contains (D). 
Dedekind’s idea was to define “divisibility” of ideals in a ring R by 


saying that 
I “divides”? J <=  J/contains J. 
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(We put this notion of division in quotes for now, because it remains to be 
seen whether it is consistent with the usual notion of divisibility: / divides 
J < J =T/K for some ideal K. The latter notion comes into contention 
when we define the product of ideals in Section 11.7.) 

Dedekind’s definition includes divisibility of elements s,f € R, since in 
any ring we can define the principal ideals 


(s)={sr:rER}, (t)={tr:reER} 
and show that 
sdividest <  (s) contains (ft). 


But “divisibility” of ideals generally extends the concept of divisibility of 
elements, because not every ideal is principal. 

In particular, we shall see that there are nonprincipal ideals in Z[,/—5], 
and they include “ideal primes” that restore unique prime factorization. 
Before doing so, however, it is helpful to look more closely at Z from the 
viewpoint of ideals. This allows us to see how the basic theory of ideals 
elegantly includes the traditional theory of divisibility, common divisors, 
and primes. 


Ideal theory in Z 


The basic theory of divisibility in Z consists of three theorems about ideals. 
The first is a counterpart of the division property. 


Principal ideal property of Z. All ideals in Z are principal. 


Proof. Suppose / is an ideal of Z other than (0). Then / has a least positive 
member, a say. Since / is closed under multiplication by members of Z, / 
includes all members of 


(a) = {an:n€ Z}. 
But these are the only members of J, because if b is not a multiple of a then 
b — (greatest multiple of a less than b) 


is a positive member of / less than a, contrary to assumption. 0 


The second theorem yields the gcd without the Euclidean algorithm, 
generalizing the example of the previous section. 
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The gcd ideal. The set (a) +(b) = {am+bn: m,n € Z} is (ged(a,b)). 


Proof. Since gcd(a,b) divides a and b, it divides each number am + bn. 
Thus {am+ bn: m,n € Z} includes only multiples of gcd(a, b). 

Now {am+bn: m,n € Z} is clearly an ideal, hence by the previous 
theorem it consists of the multiples of its least positive member c. Since a 
and 6 are numbers of the form am-+ bn, they too are multiples of c, hence 
cis acommon divisor of a and b. But we already know that c is a multiple 
of gcd(a,b), hence c = gcd(a,b). Thus the ideal {am-+ bn: m,n € Z} of 
multiples of c includes exactly the multiples of gcd(a, D). 0 


Finally, we can express the prime divisor property in terms of ideals. 
The proof is close to the one originally given for the prime divisor property 
(Section 2.4). Since “divides” means “contains” for ideals, the only ideals 
containing an ideal (p) for prime p are (p) itself and (1) = Z. 


Prime ideal property. [f p is prime and the ideal (p) contains (ab), then 
(p) contains (a) or (p) contains (b). 


Proof. Suppose (a) Z (p), so we have to prove (b) C (p). 

Since the ideal {am+ pn: m,n © Z} contains both (p) and (a), and 
(a) Z (p), {am-+ pn: m,n € Z} can only equal (1). 

This means 1 = am-+ pn for some m,n € Z and 


l=am+pn=b=abm-+ pbn_ multiplying both sides by b 
=>b€(p)_ since ab € (p) by hypothesis and p € (p) 


=> (b) C (p) as required o 


As we know, the prime divisor property is the essence of unique prime 
factorization. However, we cannot discuss factorization of general ideals 
yet, because we have not defined the product of ideals. Likewise, we cannot 
yet define the general concept of “prime” ideal, though the prime ideal 
property for Z suggests how it should be done as soon as we have defined 
the product. 


Exercises 


Just as gcd(a,b) results from a simple operation on (a) and (b), so does Icm(a,D). 
Moreover, the operation makes sense on any ideals, and hence gives a general 
definition of lcm in any ring. 


11.2.1 Show that (Icm(a,b)) = (a)M (bd). 
11.2.2 For any ideals / and J in a ring R, prove that 7M J is an ideal. 
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11.3. Principal ideal domains 


“is an example of a principal ideal domain, a ring R in which every ideal 
is of the form (a) = {ar:reé R}. 

Other examples are rings with a Euclidean algorithm, such as Zi], 
Z|\/—2], and Z|¢,|. As we know, a ring R has a Euclidean algorithm if 
R has the division property, which we can formalize as follows. 
Definition. R is a Euclidean ring if there is a nonnegative integer-valued 
function |r| on R such that |r| = 0 only if r = 0, and for any a,b € R with 
\b| > 0 there are g,r € R such that a = gb+r with 0 < |r| < |b]. 

For the rings mentioned above, the function |r| is the absolute value 
function. For certain examples where the norm can be negative, such as 
Z[/2), the square root of the absolute value of the norm will serve as |r]. 
(Exercise 9.1.6 is then a proof that Z/\/2] is a Euclidean ring.) 

The theorems about Z in the previous section generalize to the follow- 
ing theorems about principal ideal domains. Since the proofs are similar, 
we abbreviate them slightly here. 

Principal ideal property of Euclidean rings. Any Euclidean ring is a 
principal ideal domain. 

Proof. If J ¢ (0) is an ideal of R let b € J be of minimal norm > 0. Since / 
is an ideal it includes all multiples of b by elements of R. 

Conversely, if a € / is not a multiple of b, then we have a = gb+r with 
0 < |r| < |b|, and r= a—gqb € I, contrary to the minimality of b. 

Thus / = (5). 0 
Prime divisor property for principal ideal domains. /[f p is a prime in a 
principal ideal domain and p divides ab, then p divides a or p divides b. 
Proof. Suppose p is a prime that divides ab but not a, so we have to prove 
that p divides b. 

R a principal ideal ring = ideal {ar+ ps:r,s © R} = (t) forsomet ER 
= (1) 2 (a) and (t) 2 (p) 
=> t divides a and p 
=<>t= 1 since pis prime 
=> 1l=ar+ps forsomersEeR 
=> b= abr-+ pbs 
=> p divides b. C 
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These theorems give a uniform explanation why the prime divisor prop- 
erty (and hence unique prime factorization) holds in the rings Z, Ziil, 
Z|\/—2], and Z[,|—they are all Euclidean, and hence they are principal 
ideal domains. 

The principal ideal property of Zi] has an interesting geometric inter- 
pretation. As we observed in Section 6.4, the principal ideal (f) of all 
multiples of B 4 0 in Z/i] is a lattice of the same shape as Z|i] (and mag- 
nified by |8)). Therefore, since all ideals of Z[i] are principal, it follows 
that any ideal # (0) in Zli| has the same shape as Z|i|. The same is true of 
Z|/—2] and Z[¢,], as we observed in Sections 7.2 and 7.4. (We say that 
two sets in the plane have the “same shape” if one can be mapped onto the 
other by a function that multiplies all distances by a constant.) 

Conversely, nonprincipal ideals exist only when unique prime factor- 
ization fails, and therefore differently shaped ideals exist only when unique 
prime factorization fails. We shall see that they actually occur in Z[./—3] 


and Z|\/—5). 


Exercises 


It happens that the only imaginary quadratic fields Q(Vd ) whose integers form 
Euclidean rings are those with d = —1, —-2,—3,—7,—11. The only two we have 
not studied already are Q(.,/—7) and Q(,/—11), the integer rings of which are 
Z [tev] and Zev) respectively, because d = 1 (mod 4) in these cases. We 
let o@ = lyri or Lyall and consider the points of Z[q@| in the plane C. 

As usual (see Chapter 7), the division property is implied by the statement 
that any point of the plane is at distance < | from the nearest point of Z|@|. We 
prove this with the help of Figure 11.4, which shows 0 and its 6 neighbors +1, 
+@, +(@-— 1). The hexagonal region around 0 is bounded by the perpendicular 
bisectors of the lines from O to its neighbors, and hence the points inside it are 
those that are nearer to 0 than to any neighbor. Since any point of Z/@] looks like 
O, it suffices to prove that the point on the hexagon farthest from 0, namely ai, is 
at distance < |. 


11.3.1 The point ai is equidistant from 0 and @ (why’?). Deduce that 


2a Vial 


| igi-ol=|—2 . 
lai| = |ai— @| |- 5 i 


11.3.2 Deduce from Exercise 11.3.1 that ad = Cy 


11.3.3 Deduce from Exercise 11.3.2 that ad < 1 for d = —7,—11, and hence that 
Ze has the division property in those cases. 
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Figure 11.4: The region of points nearer to O than its neighbors 


Thus Z [dey and Z [tev have unique prime factorization, which we 
can apply to solve the equations y> = x* +7 and y> = x* + 11 as we solved y* = 
x? + 1.x? +2, and x* +4 in Chapter 7 and its exercises. The case of y =x*+11 
is particularly interesting, because one of its solutions is large enough not to be 
obvious. 

11.3.4 [fy =x? +11 = (x+V—11)(x— V—1)), use unique prime factorization 


x+—I1 = (a? — 33ab”) + (3a*b — 11b°)/—11 


where a, b are both integers or else both half-integers that are not integers. 


11.3.5 Find one integer solution of y° = x* + 11 by finding an integer solution of 
the equation b(3a* — 11b*) = 1. 


11.3.6 Find another integer solution of y? = x* + 11 by finding a half-integer solu- 
tion of b(3a* — 11b7) = 1, and show that it is the only other integer solution 
x, y (up to the sign of x). 
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11.4 A nonprincipal ideal of Z/./—3| 


We might attempt to understand the nonunique prime factorization of 4 in 


Z[V/—3}, 
4=2x2=(1+V—3)(1-v-3), 


by looking for an “ideal” common divisor of 2 and 1+ \/—3 to split the 
factors. Using the idea of Section 11.1, we form the ideal (2) + (1+ /—3) 
by adding all multiples of 2 to all multiples of 1 + //—3. 

An arbitrary element of (2) + (1+ /—3) is 


2(a+bV—3)+(1+V-3)(c+dV—-3) forsome a,b,c,d EZ 
= 2(a—b—2d)+ (1+ V—3)(2b+c+d) 
=2m+(1+V—-3)n forsome m,né€ Z. 


Conversely, 2m+ (1+ /—3)n € (2)+ (14+ V—3) for any m,n € Z, and 
therefore 


(2)+ (1+ V—-3) = {2m+ (14+ V—-3)n: m,n e€ Z}. 
Figure 11.5 shows the elements of this ideal, marked in black, on a 
picture of Z|./—3]. It is clear that (2) + (1 + V—3) consists of equilateral 
triangles, and hence it is not of the same shape as Z|,\/—3]. (For example, 


no point of the ideal has neighbors in perpendicular directions.) 


eo Oo © 0 &@®¢ 0 6© O @ 


eo Oo © 0 e®¢ 60 6© Oo @ 


Figure 11.5: The ideal (2) + (1 + V—3) in Z/,/—3] 
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Thus (2) + (1+ V—3) is a nonprincipal ideal: it does not consist of 
the multiples of any member of Z[,/—3]. However, we can dream that 
(2)+ (1+ V—3) consists of multiples of an “ideal number’—something 
outside Z|[,/—3]—and this dream is easily realized. 

The shape of (2) + (1 + /—3) is exactly the same as the shape of 
fae S =3], the set of multiples of Lty=3 (Figure 11.6), so Lys is the 


desired “ideal number’. In Z[ tv), Liy=3 > divides both 2 and 1+ /— 
and its norm is 1, hence 4 ye j is the sed of 2 and 1+ /— 


eo ¢ © 6&6 ® 
o ¢© 6 ¢ 
eo @ © 6&6 @6 
0 I 
o 6© 6 ¢@ 
eo ¢ © 6© ¢@ 


incipal i i+/-3\ _ 7 | l+V-3 
Figure 11.6: The principal ideal (La | — Z, [aby 


Thus the “ideal number” whose “multiples” make up the nonprincipal 
ideal really exists in this case, but outside the original ring. 


Exercises 


An interesting variation of this phenomenon occurs in Z[\/—7|, where the ideal 
(2)+(1+ 7-7) is nonprineipa’. It is in fact the same shape as the principal ideal 


tea f 


of multiples of 4! Liye j in ZU 


1), though this is not immediately obvious. 


11.4.1 Show that (2) + (1+ /—-7) = {2m+(1+J-7)m: mn e€ Zt}. 

11.4.2 Using the approximation \/7 ~ 2.6, sketch a picture of Z|,\/—7| and mark 
the members of (2) + (1+ /—7). 

11.4.3 Show that (2) + (1+ /—7) is not the same shape as Z/./—7], and hence 
is not a principal ideal. 

11.4.4 Show that gcd(2,1+ ./—-7) = lby=i = j in Z[¥—4), and sketch the princi- 
pal ideal (! Lyn | in a picture of zi! “ 7, 

11.4.5 By computing the side lengths in a triangle of this principal ideal, show 
that it has the same shape as (2) + (1 + /—7) in Z|/—7). 

11.4.6 Show that 8 has distinct prime factorizations in Z|,/—7], but that each of 
these factorizations splits into the same prime factorization in Zit), 
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11.5 A nonprincipal ideal of Z/./—5| 


Like Z|./—3], Z|./—5] fails to have the prime divisor property, as is shown 
by the two factorizations 


2x3=(14+ V/—5)(1— V—5). 


The numbers 2, 3, 1-+./—5, 1 — /—5 have norms 4, 9, 6, 6 respectively, 
whose divisors 2 and 3 are not the norm a* +5b? of any member a+ b\/—5 
of Z[,/—5]. Hence none of 2, 3, 1+ ./—5, 1 — /—5 is a product of numbers 
of smaller norm, and so they are primes of Z|./—S). 

To understand this nonunique prime factorization we first construct the 
“ideal” gcd of 2 and 1+./—5: the set (2) + (1+ V—5S) of sums of multiples 
of 2 and multiples of 1+ ./—5. A similar calculation to the one in the 
previous section shows that any member 


2(a+ bV/—5) + (14+ V—5)(c+dvV—5) € (2) + (14+ V-5) 


is actually of the form 2m-+ (1+ +./—5)n for some m,n € Z (exercise). 
Conversely, any such number 2n + (1 + /—5)n is in (2)+ (1+ V—5S), so 


(2)+(1+V—5) = {2m+ (14+ V—5)n: mn € Z. 


Figure 11.7 shows this ideal as black dots in a picture of Z[,/—5]. It 
is clear from the picture that no black dot has neighbors in perpendicular 
directions, so this ideal is not of the same shape as Z[,/—5]. In particular, 
it is not a principal ideal (8), since (B) is simply Z[./—5] multiplied by B, 
which multiplies all distances by |B| and hence produces a set of the same 
shape, as observed in Section 8.1. Thus (2) + (1 + /—5) is nonprincipal. 

We would like to view the members of (2) + (1+ ./—5) as multiples of 
some “ideal number” outside Z[\/—5]. However, Z[\/—5] already contains 
all the integers of Z[,/—5], so it is not clear where this “ideal number’ can 
be found. Instead, we follow Dedekind (1871) and do without the “ideal 
number”, working directly with the ideal. 

As we have seen, a principal ideal (8) behaves the same as the number 
B in the sense that 


(B) contains (y) <= B divides y. 


In this sense, the nonprincipal ideal (2) + (1 + /—5) behaves like a number 
dividing both 2 and 1+ /—5, because 


(2)+(1+/—5) contains (2), (2) + (1+ /—5) contains (1+ V—S). 
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o6o O @ 0 @© 0 €@© 0 © 
Figure 11.7: The nonprincipal ideal (2) + (1 + V—S) of Z//—5| 


And in fact we have seen that (2) + (1+ ./—5) deserves to be the “greatest 
common divisor” of 2 and 1+ ./—5 in Z[,/—5], since the analogous ideal 
in Z, (a) + (b) = {am+bn:m,n € Zh, is (gcd(a,b)), as we saw in Section 
11.1. 

Not only that. (2) + (1+ /—5) deserves to be called a prime. We 
noted in Section 11.2 that a prime principal ideal (p) is maximal in the 
sense that (1) and (p) are the only ideals containing it. Likewise, the only 
ideals containing (2) + (1 +./—5) are (1) and (2) + (1+ /—5) itself. This 
is because 


(2)+ (14+ V—5) = {2m+ (14+ V—5)n:mne€ Zh 


consists of the numbers of the form even + (1+ ./—5)n. Hence any mem- 
ber of Z[./—5] not in (2,1 + /—5) is of the form odd+ (1+ /—5)n. But 
an ideal including all numbers even + (1 + \/—5)n and at least one number 
odd + (1 + —5)n’ obviously includes 1, hence all of Z[,/—5] by closure 
under multiplication by members of Z[,\/—5]. Thus (2) + (1+ /—5) isa 
maximal ideal and we shall see, once we have defined the product of ideals, 
that maximal ideals are prime. 

We also need to define the product of ideals to confirm that the ideal 
(2) +(1+/—5) divides (2) in the usual sense of ring theory, namely (2) = 
((2)+ (1+ /—5)) x J for some ideal J. This will be done in Section 11.7, 
and we find the mystery factor / in Section 11.8. 


11.6 Ideals of imaginary quadratic fields as lattices 209 


Exercises 


11.5.1 Check that 2(a+ b/—5) + (14+ V/—5)(c+dV/—5), when a,b,c,d € Z, is 
of the form 2m-+ (1+ ./—5)n for some m,n € Z. 


Also important in reconciling the two prime factorizations of 6 in Z|./ —5] is 
the gcd ideal of 3 and | + /—5S, the general member of which is of the form 


3(a+bV—-5)+(14+V—5)(c+dV—5), where a,b,c,d © Z. 


11.5.2 Check that 3(a+ bV/—5)+ (14+ V—5)(c+dvV—5S), a,b,c,d © Z, is of the 
form 3m-+ (1++./—5)n for some m,n € Z. 


11.5.3  Deduce from Exercise 11.5.2 that 


(3)+(14+V7—5) = (8m4 (14+ V—5)n: mn € Zh. 


11.6 Ideals of imaginary quadratic fields as lattices 


Although ideals in rings of quadratic integers are not always principal, we 
can nonetheless prove an analogue of the theorem that Z is a principal 
ideal domain. Ideals / of Z each have one generator, in the sense that each 
consists of the integer multiples of some integer a. Z itself is the ideal with 
generator 1. The analogous theorem (with a similar proof) for ideals / in 
the integers of Q(vd ) says that they have two generators, like the integers 
of Q(V/d) themselves. 

The description of the integers of Q(Vd) in Section 10.4 shows that 
they comprise either Z [i042] or Z[Vd]. In either case, they form a sub- 
group L of C with two generators: | and livd for Z[1+v4), and 1 and Vd 
for Z|/d]. When d < 0 the generators of L can be described geometrically 
as two nonzero members nearest to 0 but not on the same line through 0, 
and the group they generate is called a /attice. 

In general, a lattice L in C is a set {am+Bn:m,n € Z} where o and 
B are nonzero complex numbers not on the same line through 0. The pair 
ao, B of generators is called an integral basis for L. The elements of L lie 
at the intersections of two families of parallel lines forming a “lattice” in 
the ordinary sense of the word (Figure 11.8). An important property of the 
lattice of integers in Z(J/d ) is that any subgroup of it, and hence any ideal, 
is also a lattice. 


Lattice property of ideals. When d < 0, any nonzero ideal in the integers 


of Q(V/d) is a lattice. 
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Figure 11.8: A lattice in C. 


Proof. Suppose / is a nonzero ideal in the integers of Q(V/d), and let a be 
a nonzero element of / nearest to 0. Since / 1s closed under sums and also 
under multiplication by —1, / also includes all ordinary integer multiples of 
a. But this is not all, because / also includes a Vd, which is in a direction 
from 0 perpendicular to the direction of a (since d < 0). 

Now let B € J be as close as possible to 0 but not an ordinary integer 
multiple of a. I claim that the lattice {am-+Bn:m,n€ Z} =. 

If not, let y be a member of / not in {am-+ Bn: m,n € Z}, and con- 
sider the parallelogram of the lattice that includes y (Figure 11.9). Now y 


Figure 11.9: Lattice point «m+ Bn nearest to y. 


necessarily lies in one quarter of the parallelogram, in which case its dis- 
tance from the nearest corner am -+ Bn is less than Ie! + Bl by the triangle 
inequality, hence less than the length max(/a/|,|B|) of the longest side of 
the parallelogram. But then the element y— (am-+ Bn) of / is at distance 
< max(|o|,|8)) from 0, contrary to the choice of a and B. 0 
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The proof above does not assume that / is an ideal, only that it is closed 
under + and — (and hence is a group). Assuming / = {am+Bn:m,n€ Z} 
is an ideal leads to the stronger conclusion that / = (a) + (B), the sum of 
the principal ideals generated by o and B. 

We always have J = {am-+ Bn:m,n € Z} C (a) + (B), since each 
am € (a) and each Bn € (B). Conversely, {am+ Bn:m,n © Z} certainly 
includes a. Hence if / is an ideal then it includes all members of (@), 
by closure under multiplication by ring members. Similarly, / includes all 
members of (8), hence J D (a) + (B) by closure under sums. 


Exercises 


It should be stressed that, while an ideal {am-+ Bn:m,n € Z} is necessarily the 
sum ideal (a) + (B), the converse does not always hold. Certainly, (a) + (B) > 
{am-+- Bn:m,n © Z}, as we observed above, but (a) + (8) may include members 
not of the form a@m-+ Bn for m,n © Z. In this case the pair a, B is not an integral 
basis for (a) + (B). 


11.6.1 Show that (5) + (1 + /—5) in Z| —5] includes the element /—5. 
11.6.2 Show that /—5 4 5m+4+ (1+ /V—5)n for all m,n € Z. 


Thus (5) + (1+ /—5) 4 {5m+(1+ /—5)n: m,n € Z}, and therefore 5, 
1+ /—5 is not an integral basis for (5) + (1 + /—5). However, we know that 
(5)+ (14+ /—5) has some integral basis a, B, by the lattice property of ideals 
proved above. 


11.6.3 Find a, B € Z|,/—5| such that (5) + (1+ /—5) = {am+ Bn: m,n ce Z}. 


11.7 Products and prime ideals 


Since we want a principal ideal (s) of a ring R to behave like the element s 
of R, the product (s)(t) of principal ideals (s) and (t) should be (st). This 
means that the product of any member rs of (s) and any member r’t of (t) is 
a member rr’st of (s)(t), and hence so too is the sum of any such products. 
We use this idea to define the product of any two ideals. 


Definition. The product AB of ideals A and B in a ring R is 
AB = {a,b, + dyb, = + a,b, : a; G A,b; G B}. 


It is clear that AB is closed under sums and under products by members 
of R, hence it is an ideal. We now define prime ideals as those that have 
the “prime divisor property” with respect to products of ideals. 
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Definition. An ideal P is prime if, whenever P contains (that is, “divides’’) 
the product AB of ideals, P contains A or P contains B. 


A more concise way to state this definition of prime ideal is: 
ABCP=>ACPorBCP. (1) 
A definition involving membership rather than containment is: 
abe P=>acePorbeP. (2) 


and the two are equivalent by the following theorem. 


Equivalent definitions of prime ideal. The following properties of an 
ideal P are equivalent: 
GQ)ABCP=SACPorBCP, 
QabeP=S>acPorbe P. 
Proof. (1) = (2): 
ab & P= (ab) CP by closure of P 
=> (a)(b) CP by definition of (a)(b) 
=> (a) C Por (b) CP by property (1) 
=>aéePorbeP sincea€ (a),be€ (b). 


(2) => (1): 
Suppose AB C P and A & P, so we want to show that B C P. Since A Z P, 
there is ana € A witha ¢ P. Then 


ABCP=abeéP_ forany bd € B, by definition of AB 
=>aéPorbe€P  forany b € B, by property (2) 
=>beEP since a ¢ P by assumption 
=> BCP since this is for any b € B. O 


As mentioned in Section 11.2, prime principal ideals are “maximal” in 
the sense that the only ideal properly containing them is the whole ring. 
However, “prime” and “maximal” are not always the same, thus we need a 
separate definition of maximal ideals. 


Definition. An ideal M in a ring R is maximal if M + R but the only ideals 
containing M are R and M itself. 
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We can now prove a relationship between prime and maximal ideals in 
one direction: 


Primality of maximal ideals. Every maximal ideal is prime. 


Proof. Suppose that Mis a maximal ideal, ab € M anda ¢ M. Thus (using 
the second definition of prime ideal) we want to prove that b € M. 
Since M is maximal and a ¢ M, the ideal 


M\a| = {ar+ms:r,s € R,m€ M}, 


which contains both M and a, must be all of R. This means that 1 in 
particular is of the form ar-+ ms. Now we can use a familiar trick: 


1=ar+ms = b= abr+mbs 


=>bEM, sinceabe€MandmeM. CO 


Examples of prime ideals in Z|\/—5| 


@ In Section 11.5 we found that the nonprincipal ideal (2) + (1 + V—S) 
is maximal, hence it is prime by the above theorem. 


e Another one is (3) + (1+ /—5) = {3m4+ (14 /—5)n: mn e€ Z}, 
which is maximal for the following reason. Elements not in it are of 
the form 3m! + 1+ (1+ V—5)n’ or 3m’ +2+ (1+ /—5)n’. Now an 
ideal containing (3) + (1 + /—5) and some 3m’ +1+(1+ /—5)n’ 
includes 1, so it is Z[,\/—5]. An ideal containing (3) + (1+ /—5) 
and some 3m’ + 2+ (1+ ./—5)n’ includes 2, hence it also includes 
1 = 3 —2, so this ideal too is Z[,/—5]. 


The ideal (3) + (1 — V/—5) is maximal, hence prime, by a similar 
argument. It is called the conjugate of (3) + (1+ ./—S), since its 
members are the conjugates of the members of (3) + (1 + /—S). 


Like (2) + (1+ /—5), (3) + (1+ V—5) and its conjugate are non- 
principal ideals in Z[\/—5]. This can be seen by making a picture of 
(3) + (1 + V/—5S) and observing that no element has neighbors in perpen- 
dicular directions, hence it is not the same shape as Z{/—5]. Rather sur- 
prisingly, it has the same shape as (2) + (1 + /—5S) (exercises). 


214 11 Ideals 


Exercises 


In studying the shape of lattices, the key fact to be aware of is that if @ and B are 

complex numbers in different directions from O, then the ratio of their distances 

from 0 is | /B| and the angle between their directions is arg w/B. Thus the shape 

of the parallelogram determined by 0, a, and B is determined by the quotient 

a/B. 

11.7.1 Sketch a picture of (3) + (1 + /—5) accurate enough to show that it is not 
the same shape as Z/\/—5]. 


11.7.2 Explain why the lattices (3) + (1 + /—5) and (3) + (1 — V—5) have the 
same shape. 


11.7.3 By considering the quotients 2/(1 + ./—5) and (1 — /—5)/3, show that 
the lattices (2) + (1 + V—5) and (3) + (1 — V—5) have the same shape. 


It turns out that all nonprincipal ideals of Z|,/—5) have the same shape as 
(2) + (1+ V—5) (see Section 12.7) . Thus exactly two shapes of ideals occur for 
Z|\/—5]. The classical way to say this is that the class number of Z|,/—S] is 2. 


Now that we have a definition of product for ideals, we can define divisibil- 
ity as we do for any commutative product: B divides A means there is a C 
such that A = BC. But we previously suggested that “divides” should mean 
“contains” for ideals, so now is our chance to test the merit of containment 
as a concept of divisibility. 

Our examples are nonprincipal ideals in Z[./—5], such as the ideal 
(2) + (1+ /—5). To make products easier to read, we write the ideal with 
integral basis a1, B as (a, B); for example (2) + (1+ /—5) = (2,1+ V—5). 
(This conflicts with the notation for ordered pairs, however no ordered pairs 
will turn up to cause confusion.) 

The first example is the prime ideal (2,1 -+./—5), which contains the 
principal ideal (2). Is there an ideal C such that 


(2) = (2,1+ V—5)C? 


Happily, the answer is yes, and in fact we have the following. 


Ideal prime factorization of (2). (2) = (2,1+/—5)’. 
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Proof. It follows from the definition of product of ideals that 


4=2x2€(2,14+V—5), 
242/—-5=2~x (1+V-—5)€ (2, 1+ /—-5)’, 
—4+4+2/—-5=(1+ /—5)* E(2,1+% —5)’. 


And 


4,2+2/—5,—-4+2V/—5 € (2,14 V—5)" 
=> 2€(2,14+./—5)* by closure under + and — 
=> all multiples of 2 € (2,1+/—5)" 
by closure under multiplication by members of Z[./—5| 

=> (2) C (2,14+ /—5)*. 
Conversely, any element of (2, 1+ /—5)? is a sum of products of terms 2m 
and (1+ /—5)n. Any product involving 2m is a multiple of 2, and so is 
any product involving (1+ /—5)* = —4+2,/—5. Therefore, any element 


of (2,1+/—5)? is a multiple of 2, hence (2, 1 + /—5)* C (2), as required. 
CJ 


Likewise, the two prime ideals (3, 1 + /—5) and (3, 1 — /—5) contain 
(3), and they are in fact its ideal prime factors. 


Ideal prime factorization of (3). (3) = (3,1+ V—5)(3,1—~/—5S). 


Proof. It follows from the definition of product of ideals that 


9=3x3€(3,14+ V—5)(3,1—-V—5), 
6 = (1+ V—5)(1— V—S) € (3,14 V—5)(3, 1- V—5). 


And 


9,6 € (3,1+ V—5)(3,1~ V—5) 
> 3 € (3,14 V—5)(3,1—V—5) _ by closure under — 
=> all multiples of 3 € (3,1 + /—5)(3,1— /—5) 
by closure under multiplication by members of Z[./—5| 
=> (3) € (3,1+ V—5)(3,1- V5). 
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Conversely, any element of (3, 1+ /—5)(3, 1 — V/—5) is a sum of products 
of terms 3m and (1+ /—5)n. Any product involving 3m is a multiple of 
3, and so is any product involving (1 +./—5)(1 — /—5) =6. It follows 
that any element of (3,1 + /—5)(3,1 — /—5) is a multiple of 3, hence 
(3,1+ /—5)(3,1 — V/—5) C (3) as required. O 


These two factorizations imply that the prime factorization 2 x 3 of 6 
in Z|./—5] actually splits further, into the ideal prime factorization 


6 = (2,1+ /—5)*(3, 1+ V—5)(3,1 — V—5), 


when we replace 2 and 3 by the corresponding principal ideals (2) and 
(3). Even more marvellous, the ideal factors recombine to produce the 
other prime factorization in Z[\/—5], 6 = (1+ /—5)(1 — V/—5) (where 
(1 + /—5) and (1 — \/—5) are taken as principal ideals), because 


(1+ V—-5) = (2,14+ V—-5)(3,1+ V—5), 
(1—V—5) = (2,14 V—5) (3,1 -— V—5). 


The latter factorizations can be checked along the same lines as the ideal 
prime factorizations of (2) and (3) above (exercises). 

Thus the two prime factorizations 2 x 3 and (1 + ./—5)(1 — V/—5) of 
6 are actually different groupings of factors in the same ideal prime fac- 
torization. Admittedly, this does not prove that ideal prime factorization is 
unique in Z[\/—5], but it shows how uniqueness might be possible, and the 
next chapter will explain why it is in fact true. 


Exercises 


11.8.1 Show that 1+ /—5 € (2,1+ /~—5)(3,1+ V—5), so that 


Hence deduce from Exercise 11.8.1 that 
(1+ /-—5) = (2,14+ V-—5)(3,1+V-—5). 
11.8.3 Show similarly that (1 — /—5) = (2,1 ~- V—5)(3,1 — V—S). 
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Discussion 


The failure of unique prime factorization was a deeply hidden problem, 
which remained undetected for nearly two centuries after its side effects 
were first noticed. The first such effect is the anomalous behavior of the 
quadratic form x* + 5y’, noticed by Fermat in 1654. As we know, Fermat 
had successfully classified primes of the form x7 +. y*, x7 + 2y7, and x7 +3y" 
before this, and we have seen in Chapters 7 and 9 how his classification 
can be explained with the help of unique prime factorization in Z[,/—1], 
Z[/—2], and Z[—4*] respectively. Fermat presumably did not concep- 
tualize the problem this way, otherwise he would have seen trouble in store 
for x* + 5y’, due the failure of unique prime factorization in Z[,/—5]. 

He did indeed have trouble, but for reasons that are unclear he left only 
a conjecture about prime products of the form x* +5y*: if Py, Po are primes 
of the form 20n + 3 or 20n +7, then p, ps = x? + 5y?. 

This is strange, because primes of the form x* + 5y” exist (for example 
29 = 3°-+5 x 2”) and an easy application of congruences mod 20 shows 
that any prime p = x* +5y* must be of the form 202+ 1 or 20n +9. Euler 
(1744) evidently realized this, and conjectured that the converse is also 
true, thus: 

p= +5y & p=20n+1 or 20n+9. 


Putting the conjectures of Fermat and Euler together, we have the conjec- 
ture that x* + Sy’ is the form of 


e primes of the form 20n+ | or 20n +9, 
e products of two primes of the form 20n+ 3 or 20n +7. 


The first to account for this two-faced behavior of x7 + 5y* was La- 
grange (1773), who discovered that x* + 5y* has a hidden companion— 
the quadratic form 2x* + 2xy + 3y’—whose prime values are of the form 
20n +3 or 20n+ 7. Lagrange made this discovery through his theory of 
equivalence of quadratic forms that we introduced in Section 5.6. He dis- 
covered that equivalent forms have the same determinant, and that: 


All forms with determinant I are equivalent to x* +’. 
All forms with determinant 2 are equivalent to x* +2y’. 


All forms with determinant 3 are equivalent to x* +3y”. 


Whereas: 


218 11 Ideals 


. . . 9 
For determinant 5 there are two inequivalent forms: x° +5y* and 
2x* + 2xy+3y. 


These discoveries throw new light on the regular behavior of x* + y’, 
x* + 2y*, and x°* + 3y’, and also suggest why x* + 5y” is irregular. The deter- 
minants 1, 2, and 3 each have only one equivalence class of forms, or class 
number I, which is the simplest situation (now recognized as equivalent to 
unique prime factorization in the corresponding ring). The determinant 5 
has two equivalence classes, or class number 2, and this is more complhi- 
cated because the two forms interact with each other. Lagrange saw that the 
product of two numbers of the form 2x* + 2xy+ 3y” is of the form x* + 5y’, 
because 


(2xy + 2x,y, + 3yt) (2x5 + 2ayyy + 3y9) = X° + 5Y", 


where X = 2X,X, +X Yo + XY, — 2¥ yy and Y = xX,;yy +X5¥, + ¥,y>. He 
also saw that a number of the form x* + 5y* times a number of the form 
2x° + 2xy + 3y" is again of the form 2x7 + 2xy + 3y” because 


(xj + Syf) (225 + 2xyyq + 3y5) = 2X* + 2XY + 3Y?, 


These stellar feats of high school algebra can be emulated in a fairly 
mechanical way by using factorizations in Q|,/—5] and combining terms 
so as to produce conjugate factors. For example, since 


vr +5y" = (wtyv—5)(x —y/—5), 
2x? + 2xy + 3y? = 2 [e+ =(1+V=3)| |x =(1— v5), 
we have 
(xq + Sy{) (2x5 + 2xyy, + 393) 
= (x) +y, V3) (1 V5)? ay + B+ V=5)) sy + BU — V-3)| 
=2(x, +9, V5) [y+ 2+ 3) (x, -y, V5) [y+ 21 - v3)| 


+ 2yy- . . 
=2 a — VX, — 3y,yo + eee (itv =3) x its conjugate 
= 2X*+2XY +3Y", 


where X = XX» — YjX_ — 3Y, 5, Y = Xp Yq + 2X +1 )o- 
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But does something similar happen for any determinant? 

Lagrange’s results on the two inequivalent forms with determinant 5 
were early steps in a theory later known as composition of quadratic forms. 
The first steps were Diophantus’ identity and Brahmagupta’s generaliza- 
tion of it: 

(xj — ny) (x5 — ny3) = X* —nY?, 


where X = X,x, +ny,yy and Y = x,y, + y,x». The Brahmagupta identity 
says that “the form x* — ny*, composed with itself, yields itself”. If we 
denote the form x* + 5y* by A and the form 2x7 + 2xy + 3y* by B, then 
Lagrange’s results (combined with Brahmagupta’s) say that the composites 
of A and B have the following “multiplication table”: 


A* =A, AB=BA=B, B°=A. 


We recognize this as the multiplication table for the two-element group 
with identity element A. Today it is called the class group for Q(./—5) 
and it is defined in an entirely different way, using ideals. 

The classes A, B of the inequivalent forms x7 + 5y*, 2x7 + 2xy + 3y" 
correspond to two classes of ideals of Z/,/—5]: the class A* of principal 
ideals and the class B* of nonprincipal ideals (in this ring, all nonprincipal 
ideals are equivalent in the sense of having the same shape). The products 
of ideals computed in the previous section show that these ideal classes A* 
and B* have the same multiplication table as the forms A and B. 

I think it will be agreed that it is easier to multiply ideals than to com- 
pose forms, but it is not easy to see why they are really the same thing. For 
this reason, replacement of forms by ideals was a momentous change of 
direction for number theory. Early work in the direction of ideals, such as 
Euler’s use of quadratic integers, was snuffed out when Gauss apparently 
realized that unique prime factorization could fail. His way around this 
obstacle was to develop Lagrange’s theory of quadratic forms for arbitrary 
determinant, which he did, in stupefying complexity, in his Disquisitiones 
of 1801. 

Gauss took a small step toward a rigorous theory of quadratic inte- 
gers with his proof, published in 1832, of unique factorization in Zi]. 
This effectively made the theory of the form x* + y* obsolete. However, 
it was only in the 1840s and 1850s that Dirichlet, Kummer, Kronecker, and 
Dedekind began developing general alternatives to composition of forms. 
As mentioned earlier, Kummer had the idea of “ideal numbers”, and he 
used it successfully in the theory of cyclotomic integers, where there was 
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no viable alternative. The easier theory of quadratic integers did not emerge 
until later, perhaps because Dirichlet and Dedekind had invested a lot of 
time in simplifying the competing theory of composition of forms. 

Composition of forms began to fade only in the 1870s, when Dedekind 
developed ideal theory for algebraic integers of arbitrary degree. In 1877 he 
gave a careful exposition of the Z[\/—5] example to motivate his general 
theory of ideals of algebraic integers. His very readable little book, in 
English translation as Dedekind (1877), is recommended for its insight into 
Dedekind’s struggle to make “ideal numbers” actual. Also recommended 
is the translator’s introduction, which discusses the historical steps from 
quadratic forms to algebraic integers in rather more detail than is possible 
here. 


PREVIEW 


In this final chapter we regain unique prime factorization in rings 
such as Z|,/—5] by using prime ideals instead of prime numbers. 


We first note the pure algebraic meaning of ideals in a ring R: ideals 
are the subsets / for which the notion of “congruence mod /” makes 
sense. This generalizes the idea of congruence mod n in Z, and 
there is a corresponding generalization of the ring Z/nZ, namely 
the guotient ring R/T. 


Properties of the ideal / (in particular, being prime or maximal) 
are reflected in properties of the quotient ring R// (being an inte- 
gral domain or field respectively). For the integers of an imaginary 
quadratic field, “prime” turns out to be equivalent to “maximal”, 
which helps to prove the key property of ideals: that “contains” 
means “divides”: B > A = A = BC for some ideal C. 


The first step is to introduce the conjugate A of an ideal A, and to 
prove that AA is a principal ideal. This is used to boost results about 
principal ideals (which are easy because principal ideals behave like 
numbers) to results about general ideals. 

We use this strategy to prove that “contains” means “divides” and to 
obtain unique factorization of ideals into prime ideals. 

Returning finally to the specific case of Z/./—5], we take a brief look 
at the concept of ideal classes, in order to show that all nonprinci- 
pal ideals of Z|,/—5) have the same shape. We need this result to 


of the primes of the form x* + Sy’. 
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12.1 Ideals and congruence 


We now know that ideals can serve as “ideal numbers” in situations where 
actual numbers seem to be lacking, for example in Z[,/—5], where 2 and 
1+ /—5 should have a gcd 4 1 but don’t. However, ideals also have a 
natural abstract function: an ideal / in a ring R is a subset of R for which 
“congruence mod I” makes sense. 

Given an ideal /, we define congruence mod / by 


a=b (mod/J) <& a-—bel. 
Then the equivalence properties of = follow from closure properties of J: 


@ a=a(mod/) 
because a € [=> —a € 1, since multiples of —1 are in I, which in turn 
implies a+ (—a) =0 € / by closure under +. 


e a=b(mod/) => b=a(mod/) 
because a—b € 1 => b—ael, by closure under multiplying by —1 
again. 


@ a@=b(mod/) and b=c (mod /) > a=c (mod /) 
because a~b € landb—c él = (a—b)+(b—c) =a-—cel, by 
closure under +. 


It follows that R is partitioned into congruence classes [+ a, where 
[+a={i+ta:iel}. 
Moreover, it is meaningful to add and multiply classes by the rules 


(U¢+a)+U+b)=1+(a+b), 
U+a)1+b)=I-+ab. 
This can be proved in exactly the same way as in Section 3.2 for congru- 
ence classes nZ-+a andnZ+ bin Z. 


Any element of /+ ais of the form k+ a for some k € /, so if we add 
it to an arbitrary element /+ b of /+ b, where | € J, we get 


(k+1)+(atb), 


which is in /+ (a+b) because k+/ =i € I by closure under sums. 
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If we multiply the element k+aé/+abyl+b€1I+5 we get 
kl+kb+la+ab, 
which is in /+ ab because 


k,lLel=skl,kb,lacl by closure under products by members of R 
=>ki+kb+lae€el_ by closure under sums. 


(it should be mentioned at this point that we are assuming FR to be com- 
mutative, as is the case for the number rings we are interested in. For 
noncommutative rings one has to distinguish between /eft and right ide- 
als.) 

Finally, the set R// of congruence classes, under the + and x opera- 
tions just defined, inherits the ring properties from RK. For example, multi- 
plication is commutative in R// because it is in R: 


U+a)¢+b)=I+ab by definition of x in R// 
=I+ba by commutativity in R 
= (1+b)([+a) _ by definition of x in R/T. 


The other properties can be checked similarly, and hence R// is a ring, 
called the quotient ring of R by the ideal J. 

If one pursues the study of R// following the model of Z./nZ in Chapter 
3, then the next question is: for which ideals J is R/T a field? In Z, this 
happens when 7 is prime, but in general rings the answer is not so simple. 
We take up the question in the next section. 


Exercises 
12.1.1 Which congruence classes play the roles of 1 and 0 in R/T? 
12.1.2 Check the other ring properties of R/T. 


It is useful to visualize the congruence classes for some actual ideals /, say in 


Z|/—S\. 
12.1.3 Pick out the congruence classes of J = (2) + (1 + /—S) in Figure 11.7. 


12.1.4 Sketch a picture of J = (3) + (1+ /—S) in Z/—5) and show that it has 
three congruence classes. 
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12.2 Prime 


In Section 11.7 we saw that maximal ideals are prime. However, prime 
ideals are not always maximal, and the difference between the two is nicely 
captured by properties of the quotient ring R// of R by the ideal /. 


Characterization of prime ideals. / is a prime ideal of a ring R <= R/T 
has no zero divisors. (Zero divisors are nonzero congruence Classes I + a, 
[+b whose product I+ ab is I, the class of 0.) 


Proof. (=-) Suppose / is prime, so we have to prove that R// has no zero 
divisors. 
[+ abis the class of OS abel 
=>a€lorbel since /is prime 
=I+aorl+b is the class of 0 
=> R/I has no zero divisors. 
(<=) Suppose R/T has no zero divisors, so we have to prove that / is a prime 
ideal. 
abe€l=Il+ab=T! (theclass of 0) 
=> (I+a)(1+b)=1! by definition of product of congruence classes 
=>I+a=lor!+b=T1 since R// has no zero divisors 
=a€lorbel 
=> / is a prime ideal. LI 


Characterization of maximal ideals. / is a maximal ideal of a ring R} 
R/T is a field. (That is, each nonzero member of R/T has a multiplicative 
inverse. ) 
Proof. (=) Suppose / is maximal, so we have to prove that R// is a field. 
/+aanonzero congruence class > a ¢/ 

=> ideal {irt+as:nsERicl}=R 

by maximality of / 

=>1l=ir+as forsomernse Riel 

=>I+as=[+1 

=> |/+ahas inverse /+s 

=> R/Tis a field. 
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(<=) Suppose each nonzero congruence class [+a has an inverse / +s, so 
we have to prove that / is maximal, that is, the only ideal containing / and 
ana Zlis R. 
a &I=> 1+ais anonzero congruence class 
=>I/+as=—1+1forsomes € R, 
namely, for the inverse class /+ s of /+a 
=> 1=ir+asforsomer€ Randic/l 
=> any ideal containing / and a includes 1 and hence equals R 


=> / 1s maximal. LJ 


Remark. A ring with no zero divisors is called an integral domain. Any 
field is an integral domain, but an integral domain is not necessarily a field. 
For example, Z is an integral domain but not a field. However, an integral 
domain always has the following cancellation property in common with 
fields: 

ab=acanda#0 => b=c. 


This is because 


ab = ac = ab—ac=0 
=> a(b—c)=0 
=>b—c=0 _ since a #0 and there are no zero divisors 
=>b=c. 


Exercises 


A non-maximal prime ideal is easy to find in Z|,/—5]; for example, the principal 
ideal (2). 


12.2.1 Find the three nonzero elements of Z|,/—5]|/(2) and show that they are 
not zero divisors, hence (2) is a prime ideal. 


12.2.2. Why is (2) not maximal in Z[./—5]? 
12.3. Prime ideals of imaginary quadratic fields 


There is generally a big difference between integral domains and fields, 
and a correspondingly big difference between prime and maximal ideals. 
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However, there is one important case where integral domains are always 
fields, namely, when they are finite. 


Lemma. A finite integral domain is a field. 


Proof. Let a be a nonzero element of D, and consider a*,a’,a*,... Since 
D is finite, some value a” in this sequence recurs as a later value a’”*", so 


a’”'" = qa, and hence a” = 1 by cancellation. 
But this means a-a’~'! = 1, so a has the multiplicative inverse a”~! and 
hence Dis a field. LU 


In view of this lemma, and the characterizations of prime and maximal 
ideals in Section 12.2, a prime ideal in a ring R will be maximal whenever 
R/T is finite. This leads to the following theorem. 


Maximality of prime ideals in imaginary quadratic fields. A prime ideal 
in the integers of an imaginary quadratic field is maximal. 


Proof. Let R be the ring of integers of the quadratic field and / a prime 
ideal. By the results above, it suffices to show that R// is finite; in other 
words, that there are finitely many congruence classes /+ 7 as r ranges over 
R. But this is clear from the lattice property of ideals in Section 11.6. The 
congruence classes /-+r are represented by the r in one parallelogram of 
the lattice / (Figure 12.1) because an r’ in any other parallelogram of / is 
congruent (mod /) to such an r. 


© e) oO 0 © 


© .@) 0 O @ 


Figure 12.1: One parallelogram of the lattice /. 


Thus there are only finitely many congruence classes [+ r. 0 
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Exercises 


We have restricted attention to imaginary quadratic fields because their integers 
are easy to visualize and most of our motivating examples lead to them. A ring 
of real quadratic integers, such as Z/\/2], is not a “lattice” in the literal sense 
because it lies densely along the real axis. However, it is obvious that Z[,/2] has 
the integral basis 1, \/2 and we can prove that its ideals have integral bases too. 

The idea is to map elements a+ bv/2 of Z\/2) to points a+ biv/2 of C. Then 
we can apply geometric arguments to the image points. 


12.3.1 Show that any subset / of Z[\/2] closed under + and — is thereby mapped 
to a subset /* of C with preservation of sums and differences. If / is an 
ideal, show that /* does not lie in a line. 


12.3.2 Deduce from Exercise 12.3.1 and the lattice theorem of Section 11.6 that 
/ has an integral basis. 


The same argument obviously applies to the integers of any real quadratic field. 
Finally, the parallelogram argument used above can be carried over to these inte- 
gers. 


12.3.3 Show that the map a+ bV2+ a+ biv2 sends cosets of J to cosets of I”, 
and deduce that Z/\/2]/J is finite. (And similarly for the integers of any 
real quadratic field.) 


12.4 Conjugate ideals 


The key to the success of ideal prime factorization in quadratic fields is the 
fact, proved below, that every ideal divides a principal ideal. This allows us 
to replace questions about ideal factors by easier questions about principal 
ideal factors. The trick is to multiply an ideal A by its conjugate A, the set 
of conjugates of elements of A. Just as the product of conjugate quadratic 
integers is an ordinary integer k, the product of conjugate ideals turns out 
to be the ideal (k) of multiples of an ordinary integer. 


Product of conjugate ideals. /f R is the ring of integers of an imaginary 
quadratic field and A is an ideal of R, then 


AA =(k) forsomek € Z. 


Proof. We know from Section 11.6 that A = {a@m-+ Bn:m,n © Z} for two 
integers @ and B of R. Hence A = {@m-+ Bn: m,n € Z} and, by definition 
of product of ideals, AA = {sa@+tBB +uG@B +vaB :s,t,u,v € Z}. 
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Now ad, BB, and oB + ap are self-conjugate, and hence real. We first 
find an ordinary integer k that divides them, and then see what follows: 


ac, BB,&B + oB real integers of R 
=> 0&8,8B,a@B +aB €Z by the nature of integers of R (Section 10.4) 
=> gcd(a@, BB,AB +aB)=kEZ 
>k=pod+qBB+r(@B+aB) forsome p,q,r€Z 
by the Euclidean algorithm 
=>keAA 
=> (k) CAA. 
Conversely, to show that (k) D AA, it suffices to show that k divides 
the four generators a, BB, OB, and aB of AA. It divides w@& and BB 


by construction, and it divides w@B and @B provided (a@B)/k and (@B)/k 
belong to R. The latter numbers are the roots of 


aB ap \ ap+ EB OO BB 
(- 2) (x 9F ) 2 SEE a kok” 


the coefficients of which we know to be ordinary integers. Hence (aB)/k 
and (a8) /k are quadratic integers, and hence members of R. O 


Exercises 


The proof above assumes the lattice property of ideals, or rather that each ideal 
has an integral basis a, 8, and we know from the previous exercise set that this is 
also true for rings of integers in real quadratic fields. 


12.4.1 Check that the proof above works for real quadratic fields, where the con- 
jugation operation now is a+ b/d a—bvVd. 


The product of conjugate ideals helps to establish the existence of the ideal 
class group, because it shows the existence of inverses. The identity class of this 
group is the class of principal ideals, so the theorem says that any ideal (class) 
times its conjugate (class) is the identity class, hence conjugate ideal classes are 
inverse to each other. 


12.4.2 Bearing in mind that equivalent ideals are the same shape, show that the 
ideal (2,1 + ./—5) is self-conjugate, hence self-inverse. 


12.4.3 We already know that (2,1 + /— 5) is self-inverse. Why? 
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12.5 Divisibility and containment 


We are now almost ready to prove that “contains means divides” for all 
ideals in the ring of integers of an imaginary quadratic field. We know that 
this is the case for principal ideals, and the only other result we need is the 
following cancellation property. 


Cancellation of ideals. /f A, B, C are nonzero ideals of R and AB > AC, 
then BDC. 
Proof. In the special case where A = () is principal 
AB > AC = (a)A > (a)B 
=> @B > aC because the ideals B and C 
are closed under <x by multiples of a, 
and hence (a@)B = @B and (a)C = aC 
=+B2>C, multiplying both sides by a~!. 
In the general case, 
AB D> AC = AAB D> AAC, multiplying both sides by A 
=> (k)B>(k)C_ for some k, by Section 12.4 
=> BDC by the special case. 0 
This is our first application of the trick from Section 12.4—multiplying 
an ideal by its conjugate to reduce to the easier case of a principal ideal. 
Cancellation now allows us to extend this trick further: we can multiply an 
ideal by its conjugate to reduce to the special (and easy) case of a principal 


ideal, then cancel the conjugate to get back to the general case. This is how 
we prove that “contains means divides.” 


“Contains means divides.” /f A and B are ideals of R and B > A then B 
divides A; that is, 
A= BC for some ideal C. 


Proof. In the special case where B = () is a principal ideal 
BDA (B) DA 
=> B divides each member of A 


=A=(B)ia/B:ac€A} 
>A=BC 
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where B is the ideal (B) and C = {a@/B : a € A} is also an ideal of R. The 
a/B in C belong to R since each o is divisible by B, and they are closed 
under + and under multiplication by elements of R since this is true of the 
a ind. 

In the general case 


BDA=+BBDAB, multiplying both sides by B 
=> (k) DAB by Section 12.4 
=> AB =(k)C_ for some ideal C, by the special case 
=> AB = BBC since (k) = BB 
=> A=BC_ by cancellation of the ideal B. Oo 


12.6 Factorization of ideals 


We are now ready to prove existence and uniqueness of prime ideal factor- 
ization in the ring RX of integers of an imaginary quadratic field. Since prime 
ideals are maximal in this setting, the usual process of finding smaller and 
smaller factors is replaced by a process of finding larger and larger ideals. 


Existence. Every nonzero ideal A = R is a product of prime ideals. 


Proof. If A is not prime, it is not maximal by Section 11.7, hence there is 
an ideal B > A with B # R. Since “contains means divides,” it follows that 
A = BC for some ideal C. 

If B or C is not prime we factorize it similarly, and so on. This process 
terminates in a finite number of steps (giving a prime factorization) because 
each nonzero ideal / has only a finite number of congruence classes /+r 
(Section 12.3) and each extension to a larger ideal absorbs at least one such 
class. 0 


Uniqueness. The factorization of a nonzero ideal into prime ideals is 
unique, up to the order of factors. 


Proof. As always, uniqueness follows from existence and a prime divisor 
property, in this case: if a prime ideal P divides a product of ideals AB then 
P divides A or P divides B. 

By definition (Section 11.7) a prime ideal P has the property that if 
P 2 AB then P >A or P > B. The prime divisor property now follows 
because “contains means divides.” O 
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Exercises 


According to unique prime ideal factorization, distinct prime number factoriza- 
tions should split into the same prime ideal factorization. We saw this happen in 
Section 11.8 with the two prime number factorizations of 6 in Z[./—5]. Another 
example in Z[,/—5] is 


12.6.1 Use norms to show that 3, 2+./—5, and 2 —./—5 are prime numbers of 
Z\/—S5]. 

Now we know from Section 11.8 that (3) = (3,1+ /—5)(3,1— V/—5), so 

these prime ideal factors of 3 should also be ideal factors 2+ ./—S and 2—/—5. 


12.6.2. Show that 2— /—5 € (3,1 + //—5)?, so (2— V—5) € (3, 1+ V—5)”. 


12.6.3 Show, conversely, that 2 —./—5 divides the elements 9, 3+ 3./—5, and 
—4+4+2,/—5 generating (3,1 + /—5)’, so (3,14 /—5)* C (2-5). 


Thus the factor 2 — /—5 of 9 has prime ideal factorization (3, 1 + /—5)’, 
and it remains to show that the other factor 2+ ./—5 has ideal prime factorization 


(3,1-/—5)’. 
12.6.4 2—./—5 = (3,1+ /—5)? implies 2+ /—5 = (3, 1 — V—5)?. Why? 


12.7 Ideal classes 


As an application of unique prime ideal factorization, we determine the 
primes of the form x? + Sy’, thus solving the problem that puzzled Fermat 
and Euler. To do this we need to know a little more about the ideals of 
Z{,/—5|, namely that they fall into two classes: the principal ideals, all 
of which have the shape of Z/,/—5], and the nonprincipal ideals, all of 
which have the shape of (2, 1 + \/—5) = {2m+ (1+ V—5)n:m,n€ Z}. In 
general, the number of ideal shapes in the ring of integers of an imaginary 
quadratic field is called its class number. 


Ideal classes of Z|.\/—5|. The class number of Z|/ —5| is 2. 


Proof. Let / be a nonprincipal ideal of Z[,/—5] and let a, B be an integral 
basis of / constructed as in the proof of the lattice property of ideals in 
Section 11.6. That is, & is a nonzero element of minimal distance from 0, 
and B is of minimal distance from 0 among the elements of / not on the 
line through O and @. 
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Since / is an ideal, it includes the multiples @/—5 and a(1+./—5) 
of a, and 0. These four points together form the rectangle shown in Figure 
12.2, a typical one in the principal ideal generated by a. (For the sake of 
simplicity, the line through 0 and @ is shown horizontal in the diagram, but 
it is not necessarily the real axis.) 


ar —5 a(1+V—5) 
2 
0 e Oo 


Figure 12.2: A rectangle of multiples of a 


Since / is not the principal ideal (a), there are other members of / in- 
side the rectangle, and hence in at least one of the quarters shown. Without 
loss of generality, we can assume that B lies in the bottom left quarter (if 
necessary, taking the difference between an element of / elsewhere in the 
rectangle and the nearest corner, or replacing « by —az). Finally, since B is 
no nearer to 0 than a, by construction of the integral basis in Section 11.6, 
it lies in the shaded region of the bottom left quarter. 

But then 2B lies in the shaded region in the top half of the rectangle, 
each point of which is clearly at distance < |a| from either a@(1 + /—5) or 
a./—5. This implies that the element a(1 + /—5) —2B or a/—5 — 2B 
of J has absolute value < |a|, contrary to the choice of a, unless 


I+ V-5 —5 
a——— or B=a-——. 


p= 7 2 
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In the first case, 7 has the same shape as the ideal (2,1 + ./—5), and 
hence belongs to the same class. The second case is impossible, because 


V—-5 SOL 
a—— Eel=> > Ef, multiplying by /—5S, 


O ': 
a I, adding 3a, 


oad 


contrary to the choice of @ as an element of minimal absolute value. OU 


Exercises 


The argument used above can also be used for Z|./ —6), but it takes an interesting 
turn. The first possibility for B does not produce a nonprincipal ideal, but the 
second possibility does, so Z|,\/—6] has class number 2. 


12.7.1 Show that J = (2,1+/—6) = {2m+ (1+ /—6)n: m,n © Z} includes 1 
and hence is Z|./ —6], but that J = (2,.\/—6) = {2m+ /—6n: m,n € Z} is 
a nonprincipal ideal. 


12.7.2 Rerun the argument above, explaining why only the second possibility for 
B now produces a nonprincipal ideal, so that the class number of Z|,/—6) is 

2 and that all nonprincipal ideals of Z|./ —6| are of the form (a, a./—6/2). 
Returning to Z|,/—5], we know that the nonprincipal ideal (3, 1 + /—5) is in 

the same class as (2,1 + ./—5), because we checked that it was the same shape in 


the exercises to Section 11.7. However, the integral basis 3, | + ./—S of this ideal 
is not of the form a, a(1 + /—5)/2. 


proof above? 


12.8 Primes of the form x* + 5y7 


Now at last we know enough about Z|,\/—5] to be able to deal with the 
quadratic form x* + 5y” and the primes it represents. First observe what we 
can do with classical tools—congruences and quadratic reciprocity. 


e Experience with the forms x* + ny” for n = 1,2,3 (Section 9.1) leads 
us to consider the values of x7 + 5y* mod 20. The possible values 
of x? mod 20 are 1, 4, 9, 16, 5, and 0, hence the possible values of 
5y* mod 20 are 5 and 0. Prime values of x* +5y” are odd and not 
divisible by 5, hence the possible prime values of x7 + Sy” mod 20 
are 1 and 9. That is, primes of the form x* + 5y* are of the form 
20n+ 1 or 20n+9. 
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e To deal with x + 5y” we likewise expect to need the quadratic char- 
acter of —5. When p = 20n+ 1 or 20n+9, 


—5 —] ) Pp , , 
—}={|—](-—]=1~x (=) by quadratic reciprocity 
P PJ \P 5 


—-1x1 


since 20n + 1 = 1 = 1° (mod 5) and 20n +9 = 4 = 2’ (mod 5). Hence 
—5 is a square mod p when p = 20n+1 or p= 20n+ 9. 


We need prime ideal factorization to prove the converse of the first 
observation, namely, that every prime of the form 20n + 1 or 20n+ 9 is of 
the form x? + 5y*. Apart from the appearance of nonprincipal ideals, the 
proof is similar to the one for x” + y* in Chapter 6. 

Primes of the form x° + 5y*. The primes of the form x* + 5y* are precisely 
those of the form 20n-+ 1 or 20n+ 9. 

Proof. It remains to show that primes of the form 20n-+ 1 or 20n+ 9 are of 
the form x* + 5y’. 

The second observation above shows that —5 is a square, mod p, for 
the primes p = 20n+ 1 and 20n+ 9. In other words, for each such p there 
is anim € Z such that 


p divides m? +5 = (m+ /—5)(m— V—5). 


However, p does not divide either factor m+ /—5 or m— V/—5 in Z/\/—5], 
since 7 + = ¢ Z|,/—5|. By unique prime ideal factorization in Z[/—5], 
it follows that (p) is not a prime ideal of Z|\/—5], and hence it has a 
nontrivial prime ideal factorization. 

The easy case is where one factor is a principal ideal, say (a+ b\/—5). 
Here we can argue as we did in Z/i] back in Section 6.3: 


(p) =(a+bV—5)C for some nontrivial ideal C, 
hence, taking conjugates, 
(p) =(a—bV—5)C_ since P= p. 
Multiplying the last two equations gives 


(p*) = (a* + Sb*)CC = (a° + 5b*)(k), 
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for some k € Z, evaluating the product CC of the conjugate ideals as in 
Section 12.4. But then 


p’ = (a°+5b*)k is a nontrivial factorization in Z, 


and therefore p = a? + 5b’. 

The harder case is where all prime factors of (p) are nonprincipal ide- 
als, and hence of the form (a, a@(1 + /—5)/2) for some a € Z/,/—5], by 
the previous section. Suppose 


(p) = (a, a(1+ V—5)/2)C. 
Taking conjugates again, we obtain 
(p) = (@,a(1— V—5)/2)C, 
and multiplying the last two equations: 
(p>) = (@,a(1 + V—5)/2)(G, @(1 — V—5)/2)CC. 


Now (a,0(1+ /—5)/2)(0, (1 — V—5)/2) = (0/2), since its genera- 
tors are @& and 3000/2. And CC = (k) as before, so we have 


2p" = 00-k=(a°+5b*)k for some a,b,k € Z. 


It follows that p = a* + 5b* or else 2p = a* + 5b’, for some a,b € Z. 

The latter possibility is ruled out by congruences mod 20. From the 
values of x* and 5y* mod 20 found above, we see that the possible even 
values of a? + 5b? mod 20 are 4, 6, 10, 14, 16. None of these match the 
values of 2p, namely 40n + 2 or 40n+ 18. Hence in all cases the prime 
p = 20n+ 1 or 20n+9 is of the form x + 5y’. CI 


Exercises 


We have now classified the primes of the forms x? + y*, x? + 2y", x? + 3y?, and 
x* +5y*. Why did we skip the form x7 + 4y*? 


12.8.1 Show that the primes of the form x* + 4y* are the same as those of the 
form x* + y*, with one exception. 


Primes of the form x7 + 6y* can be found in much the same way as those of 
the form x7 + 5y*. For the hard step (where the prime ideal factors of (p) are all 
nonprincipal) we use the determination of nonprincipal ideals of Z[,/—6]| from the 
previous exercise set. 
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12.8.2 Use congruences mod 24 to show that any prime of the form x? + 6y” is of 
the form 24n + 1 or 24n +7. 


12.8.3 Use quadratic reciprocity to show that, for any prime p of the form 24n+ 1 
or 24n +7, —6 is a Square mod p. 


12.8.4 Deduce from Exercise 12.8.3 that (p) is not a prime ideal of Z|,/—6) when 
p is a prime of the form 24n + | or 24n+7. 


12.8.5 When (p) has a prime ideal factor of the form (a + b\/—6), show that 
p= a +6b". 


12.8.6 When all prime ideal factors of (p) are nonprincipal, hence of the shape 


(at, o,/—6/2) by Exercise 12.7.2, show that p has the form x? + 6y? in this 
case also. 


12.9 


Discussion 


The treatment of rings, ideals, and quotient rings at the beginning of this 
chapter is probably the minimum required for significant applications to 
number theory. Along with some group theory and field theory, as found 
in Elements of Algebra, it should give enough algebraic background to 
read classical treatments of algebraic number theory, such as Hecke (1981). 
There one will find the theorem on unique prime factorization for the ideals 
of an arbitrary number field Q(0@), where @ is an algebraic number, and the 
finiteness of the class number. These two theorems were first proved by 
Dedekind (1871), and his exposition of them in Dedekind (1877) is still 
worth reading, though not quite as streamlined as modern accounts. The 
finiteness of the class number, in particular, is usually proved today with 
the help of Minkowski’s geometry of numbers. 

The class number has a long history, beginning with the Lagrange 
(1773) idea of reducing binary quadratic forms. In the case of positive de- 
terminant, Lagrange gave an algorithm that finds the “simplest” equivalent 
of a given form, and at the same time finds a complete list of inequivalent 
forms. Thus the algorithm determines the number of inequivalent forms 
with given determinant D, in other words, the class number for D. 

Gauss (1801) extended Lagrange’s algorithm to binary quadratic forms 
with negative determinant, exploiting the inherent periodicity of such forms 
that we observed in Chapter 5. Finding a formula for the class number was 
much more difficult, and it was a turning point in the history of number 
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theory when Dirichlet (1839) succeeded. His method uses L-series, a so- 
phisticated generalization of the ¢-function discovered by Euler (and men- 
tioned in Section 2.9). No substantially simpler approach has ever been 
found, and indeed L-series seem to be the right tool for the job. Dirichlet 
also used them to prove his theorem on primes in arithmetic progressions 
(see Section 9.9 for the background to this theorem). Both the class num- 
ber formula and the theorem on primes may be found in Dirichlet’s lectures 
on number theory, Dirichlet (1863), available in English translation. 

Other approaches to the class number involve equally sophisticated 
mathematics, such as modular functions. The classical modular function 
j(z) is a function defined on the upper half plane of C, with the periodicity 
indicated by Figure 12.3, the modular tessellation. 


. ° ; oY 


Figure 12.3: The modular tessellation 


The function 7 maps the shaded region one-to-one onto C (one half 
onto the upper half plane, and the other half onto the lower), and repeats 
its values in every other region of the tessellation. The precise way to say 
this is that if a,b,c,d € Z and ad — bc = +1, then 


[a+b = j(2) 


This periodicity property allows j to be viewed as a function of lattice 
shapes, as we now explain. 
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A lattice generated by @,,@, € C has “shape” given by the complex 
number @,/@,, because |@,/@,| = |@,|/|@,| is the ratio of side lengths 
of a generating parallelogram, while arg @, /@, is the angle between the 
sides. However, the same lattice is generated by a@, + b@,, c@, + dq@, for 
any d,b,c,d € Z with ad — bc = +1, and hence its “shape” is represented 
as well by the number (aw, + b@,)/(c@, -+d@,) = (ag +b) [co +d). 
The shaded region of the modular tessellation contains exactly one repre- 
sentative of each lattice shape, so j gives each shape a different value, but 
(by periodicity) 7 takes the same value at each representative of a lattice 
shape. 

Because of this, 7 may have something to say about the class number 
of imaginary quadratic fields, which (as we saw in Section 12.7) is the 
number of shapes of ideals of the field. Kronecker (1857) discovered that 
it does: the class number of Q(V/d) is the degree of j(\/d). For example, 
with d = —1 we get the ordinary integer 


j(i) = 1728 


which is of degree 1, confirming that the Gaussian integers have class num- 
ber | (that is, all ideals are principal). Likewise 


j( + V—15) /2) = (—191025 + 859955) /2 


which is of degree 2, showing that Q(,/—15) has class number 2. For 
proofs of Kronecker’s amazing theorem, see McKean and Moll (1997) or 
Cox (1989). 

Cox’s book, in fact, is a good sequel to this one, because it describes 
the complete classification of primes of the form x* + ny*. This involves 
not only more sophisticated algebra (class field theory), but also modular 
functions and related topics from analysis. Another book worth mentioning 
is Scharlau and Opolka (1985). As its title says, it covers the development 
of number theory “from Fermat to Minkowskv’, with particular emphasis 
on quadratic forms. It is in some ways complementary to this one, since 
it says little about ideal theory but is strong on analysis and the geometry 
of numbers. The latter topics are essential for anyone who wants to master 
number theory beyond the elements covered here. 
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